Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-28 Thread Charles Chen
Thank you everyone for your work on this release.  I'm pleased to announce
that the 2.7.0 RC3 is approved for release with 3 PMC +1 votes and no -1
votes.

On Thu, Sep 27, 2018 at 5:31 AM Łukasz Gajowy 
wrote:

> +1
>
> I once again looked at the Nexmark dashboards, it seems that there are no
> performance regressions.
>
> czw., 27 wrz 2018, 00:02 użytkownik Jean-Baptiste Onofré 
> napisał:
>
>> +1 (binding)
>>
>> Regards
>> JB
>> Le 26 sept. 2018, à 18:00, Ahmet Altay  a écrit:
>>>
>>> +1. Thank you all!
>>>
>>> On Wed, Sep 26, 2018 at 2:33 PM, Charles Chen  wrote:
>>>
 +1. Performed additional validations as listed in the spreadsheet.


 On Wed, Sep 26, 2018, 3:24 AM Robert Bradshaw < rober...@google.com>
 wrote:

> +1 (binding), same verification as before.
>
> On Wed, Sep 26, 2018 at 7:36 AM Charles Chen < c...@google.com> wrote:
>
>> To clarify, the only difference between RC2 and RC3 is the Python
>> fix  https://github.com/apache/beam/pull/6494.
>>
>> This means that the Java validations from RC2 should carry over,
>> though I reran validations with RC3 anyway, as detailed on the 
>> spreadsheet.
>>
>> On Wed, Sep 26, 2018 at 12:41 AM Charles Chen < c...@google.com>
>> wrote:
>>
>>> As with before, please add any validation performed to the
>>> spreadsheet here:
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1675964688
>>>
>>> On Wed, Sep 26, 2018 at 12:30 AM Charles Chen < c...@google.com>
>>> wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #3 for the version
 2.7.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific
 comments)

 The complete staging area is available for your review, which
 includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with fingerprint
 45C60AAAD115F560 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.7.0-RC3" [5],
 * website pull request listing the release and publishing the API
 reference manual [6].
 * Java artifacts were built with Gradle 4.8 and OpenJDK
 1.8.0_181-8u181-b13-1~deb9u1.
 * Python artifacts are deployed along with the source release to
 the dist.apache.org [2].

 The vote will be open for at least 72 hours. It is adopted by
 majority approval, with at least 3 PMC affirmative votes.

 Thanks,
 Charles

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343654
 [2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
 [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1048/
 [5] https://github.com/apache/beam/tree/v2.7.0-RC3
 [6] https://github.com/apache/beam-site/pull/549

>>>
>>>


Re: Tracking and resolving release blocking bugs

2018-09-28 Thread Ahmet Altay
This sounds reasonable to me.

On Fri, Sep 28, 2018 at 5:39 PM, Connell O'Callaghan 
wrote:

> Ismaël and Ahmet thank you both for your responses!!!
>
> Can the text on the site today - reproduced below - be enhanced with the
> text in bold? Note I would not expect this new text to be in bold once
> deployed to the site.
> Triage release-blocking issues in JIRA
>
> There could be outstanding release-blocking issues, which should be
> triaged before proceeding to build a release candidate. We track them by
> assigning a specific Fix version field even before the issue resolved.
>
> The list of release-blocking issues is available at the version status
> page
> .
> Triage each unresolved issue with one of the following resolutions:
>
>- If the issue has been resolved and JIRA was not updated, resolve it
>accordingly.
>- If the issue has not been resolved and it is acceptable to defer
>this until the next release, update the Fix Version field to the new
>version you just created. Please consider discussing this with stakeholders
>and the dev@ mailing list, as appropriate.
>- If the issue has not been resolved and it is not acceptable to
>release until it is fixed, the release cannot proceed. Instead, work with
>the Beam community to resolve the issue.
>
> *If there is a bug found in the RC creation process/tools, those issues
> should be considered high priority and fixed in 7 days. *
>
> This is to prevent a regression in the amount of time it takes to cut an
> RC.
>
> On Fri, Sep 21, 2018 at 2:06 PM Ahmet Altay  wrote:
>
>> I agree with Ismaël, this process is generally working well. As an
>> improvement we could document it somewhere.
>>
>> One other area I think we can improve is, issues that are related to the
>> release process. For example, at times we will have issues that require
>> doing additional manual steps in the release process, we will file a JIRA
>> and we tend to punt those JIRAs release after release as long as there is a
>> manual work around. However, these issues add to the cost of cutting
>> releases. I would propose prioritizing those issues that are identified
>> during the release and about the release process itself.
>>
>> Ahmet
>>
>> On Wed, Sep 19, 2018 at 4:41 AM, Ismaël Mejía  wrote:
>>
>>> We have done this so far by letting the JIRA issues 'Open' with the
>>> 'Fix version' corresponding to the upcoming release and we track the
>>> progress between the branch cut and the first release candidate with
>>> the assigned parties, the process has been the de-facto standard since
>>> long time ago and has worked so far smoothly. More info here:
>>>
>>> https://beam.apache.org/contribute/release-guide/#
>>> triage-release-blocking-issues-in-jira
>>>
>>> Is there something missing? or do you have other ideas maybe to
>>> improve it in mind?
>>>
>>> On Wed, Sep 19, 2018 at 2:34 AM Connell O'Callaghan 
>>> wrote:
>>> >
>>> > Hi All
>>> >
>>> > In order to allow successful and smooth deployment of the latest BEAM
>>> releases, are the community OK that we track bugs blocking releases, with a
>>> goal to resolve such bugs within a week? If there is general agreement (or
>>> no major objections) on this we will edit the contributor page using
>>> similar language to the "Stale pull requests" section -early next week.
>>> >
>>> > Thank you all,
>>> > - Connell
>>>
>>
>>


Re: Tracking and resolving release blocking bugs

2018-09-28 Thread Connell O'Callaghan
Ismaël and Ahmet thank you both for your responses!!!

Can the text on the site today - reproduced below - be enhanced with the
text in bold? Note I would not expect this new text to be in bold once
deployed to the site.
Triage release-blocking issues in JIRA

There could be outstanding release-blocking issues, which should be triaged
before proceeding to build a release candidate. We track them by assigning
a specific Fix version field even before the issue resolved.

The list of release-blocking issues is available at the version status page
.
Triage each unresolved issue with one of the following resolutions:

   - If the issue has been resolved and JIRA was not updated, resolve it
   accordingly.
   - If the issue has not been resolved and it is acceptable to defer this
   until the next release, update the Fix Version field to the new version
   you just created. Please consider discussing this with stakeholders and the
   dev@ mailing list, as appropriate.
   - If the issue has not been resolved and it is not acceptable to release
   until it is fixed, the release cannot proceed. Instead, work with the Beam
   community to resolve the issue.

*If there is a bug found in the RC creation process/tools, those issues
should be considered high priority and fixed in 7 days. *

This is to prevent a regression in the amount of time it takes to cut an RC.


On Fri, Sep 21, 2018 at 2:06 PM Ahmet Altay  wrote:

> I agree with Ismaël, this process is generally working well. As an
> improvement we could document it somewhere.
>
> One other area I think we can improve is, issues that are related to the
> release process. For example, at times we will have issues that require
> doing additional manual steps in the release process, we will file a JIRA
> and we tend to punt those JIRAs release after release as long as there is a
> manual work around. However, these issues add to the cost of cutting
> releases. I would propose prioritizing those issues that are identified
> during the release and about the release process itself.
>
> Ahmet
>
> On Wed, Sep 19, 2018 at 4:41 AM, Ismaël Mejía  wrote:
>
>> We have done this so far by letting the JIRA issues 'Open' with the
>> 'Fix version' corresponding to the upcoming release and we track the
>> progress between the branch cut and the first release candidate with
>> the assigned parties, the process has been the de-facto standard since
>> long time ago and has worked so far smoothly. More info here:
>>
>>
>> https://beam.apache.org/contribute/release-guide/#triage-release-blocking-issues-in-jira
>>
>> Is there something missing? or do you have other ideas maybe to
>> improve it in mind?
>>
>> On Wed, Sep 19, 2018 at 2:34 AM Connell O'Callaghan 
>> wrote:
>> >
>> > Hi All
>> >
>> > In order to allow successful and smooth deployment of the latest BEAM
>> releases, are the community OK that we track bugs blocking releases, with a
>> goal to resolve such bugs within a week? If there is general agreement (or
>> no major objections) on this we will edit the contributor page using
>> similar language to the "Stale pull requests" section -early next week.
>> >
>> > Thank you all,
>> > - Connell
>>
>
>


Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #188

2018-09-28 Thread Apache Jenkins Server
See 


--
[...truncated 24.43 MB...]
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hadoop-input-format/2.8.0-SNAPSHOT/beam-sdks-java-io-hadoop-input-format-2.8.0-20180929.000300-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hadoop-input-format/2.8.0-SNAPSHOT/beam-sdks-java-io-hadoop-input-format-2.8.0-20180929.000300-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

46: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hbase:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180929.000311-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180929.000311-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

47: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hcatalog:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180929.000319-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180929.000319-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

48: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jdbc:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180929.000327-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180929.000327-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

49: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jms:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jms/2.8.0-SNAPSHOT/beam-sdks-java-io-jms-2.8.0-20180929.000335-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jms/2.8.0-SNAPSHOT/beam-sdks-java-io-jms-2.8.0-20180929.000335-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

50: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-kafka:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-kafka/2.8.0-SNAPSHOT/beam-sdks-java-io-kafka-2.8.0-20180929.000345-14.jar'.
  

Re: Python typing library is not provisional in Python 3.7

2018-09-28 Thread Manu Zhang
Aha, I see. I'm coming from the future.

Thanks Ahmet and Valentyn.

On Sat, Sep 29, 2018 at 3:06 AM Valentyn Tymofieiev 
wrote:

> Hi Manu,
>
> I second what Ahmet said - thanks for the pointers. Python 3.7 support can
> come later down the road.
>
> Thanks,
> Valentyn
>
> On Fri, Sep 28, 2018 at 11:17 AM Ahmet Altay  wrote:
>
>> Hi Manu,
>>
>> Currently, we use Python 3.5.2 on Jenkins for testing. Python tests print
>> out the python version in the console logs and I found this information
>> from one of the logs [1]. Initial proposal for the Python 3 support was to
>> support a specific version of python 3 during the porting process and later
>> on work to add support for additional versions [2]. (Also note that, Python
>> 3 was released about 3 months ago and after the porting effort started.)
>>
>> Hope this helps.
>>
>> Ahmet
>>
>> [1]
>> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/6114/consoleFull
>> [2]
>> https://lists.apache.org/thread.html/5371469de567357b1431606f766217ef73a9098dc45046f51a6ecceb@%3Cdev.beam.apache.org%3E
>>
>> On Thu, Sep 27, 2018 at 10:09 PM, Manu Zhang 
>> wrote:
>>
>>> Hi Valentyn,
>>>
>>> I'm aware there is Python 3 environment and have worked on the options
>>>  module. Yes, I'd love to
>>> contribute more.
>>> The issue I raise here is specifically about Python 3.7, where the
>>> dependency on typing library would fail all the tests.
>>> Do you know which version of Python 3 is setup for our tests ?
>>>
>>> Manu
>>>
>>> On Fri, Sep 28, 2018 at 8:02 AM Valentyn Tymofieiev 
>>> wrote:
>>>
 Hi Manu,

 We have added Python 3 environment to our tests see [1], and we are
 actively making changes to Beam code to make it Python 3-compatible. We are
 enabling tests module by module, although we have to disable some of the
 tests initially, when failures are likely introduced in other modules.

 I think @RobbeSneyders is currently working on typehints package
 specifically, as per our Kanban board [2].

 If you (or anyone else) is interested in helping with Python 3 support,
 and has cycles to actively work on it now, please reach out - I would be
 happy to coordinate the effort, and help with code reviews.

 Thanks,
 Valentyn

 [1]
 https://github.com/apache/beam/blob/5d298db4c20bbb8876a5b75142341332c1e3fb8d/sdks/python/tox.ini#L56
 [2]
 https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail


 On Thu, Sep 27, 2018 at 3:52 PM Manu Zhang 
 wrote:

> Hi all,
>
> I failed to run Python tests in 3.7 with the following error.
>
>   File
> "/Users/doria/git/incubator-beam/sdks/python/apache_beam/typehints/native_type_compatibility.py",
> line 23, in 
>
> import typing
>
>   File
> "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
> line 1356, in 
>
> class Callable(extra=collections_abc.Callable,
> metaclass=CallableMeta):
>
>   File
> "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
> line 1004, in __new__
>
> self._abc_registry = extra._abc_registry
>
> AttributeError: type object 'Callable' has no attribute '_abc_registry'
>
> This is because the required typing library is not provisional in
> Python 3.7 .
>
> Any thoughts on this? Shall we add Python 3.7 environment to our tests
> ?
>
> Thanks,
> Manu Zhang
>

>>


Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Kenneth Knowles
On Fri, Sep 28, 2018 at 10:29 AM Thomas Weise  wrote:

> +1 for stating the goal of clear provenance and granular rollback.
>


>
>
Also of course efficiency and quality of review (we don't to review tiny or
out-of-context changes or huge mega-changes), efficiency of authoring
(don't want to wait on a review for a tiny bit because GitHub makes it very
hard to stack up reviews in sequence / don't want to have major changes
blocked because of difficulty of review), ease of new contribution (OK for
committers to do more IMO, while new/one-time contributors shouldn't need
to know or obey any policy).

I think this discussion helps to remind/identify best practices how to get
> there. Where appropriate we should augment guidelines for both, contributor
> and committer.
>

Kenn: would you elaborate on the "1 commit = 1 review (and sometimes even =
> 1 ticket)" a bit more. Is that a problem of insufficient task / ticket
> granularity or something else?
>

The problem isn't a failure to define tasks at the right granularity, but
that they naturally and fundamentally exist with a different granularity
(unit of change -> unit of review -> unit of tracking/delivery). I'd guess
there's often a 5x jump in size from commit -> review -> ticket. There are
many many easy-to-isolate changes that are wasteful to independent review
or track.

To get some concrete facts, I bet the one things we could probably find
research on is the ideal review size. And we could also scrape logs for
messages with bullet points (often each bullet is basically what would have
been a commit). Generally getting a sense of what these ratios are in
practice and idealized would be kind of interesting but maybe overkill.

Kenn

Charles: my plan is to translate the outcome of this discussion to
> guideline updates and your log filter trick will be part of it. Though my
> hope is that it won't be needed if we get closer to the goals above.
>
> Thanks,
> Thomas
>
>
> On Fri, Sep 28, 2018 at 9:44 AM Kenneth Knowles  wrote:
>
>> Anton makes a good point. We have been talking about policy for what we
>> do, but the real issue is what we want to come out of it: a clear history
>> for seeing where code came from and granular rollback. I think in both
>> cases the key thing is that each commit is a single clear change. How they
>> get there is not the point.
>>
>> I have worked on multiple projects with a 1 commit = 1 review (and
>> sometimes even = 1 ticket). These pretty much never have a good history.
>> The best case is that each commit has a message that is a bullet point of
>> many separate changes, because it is simply too inefficient to review each
>> logical change separately. But since the messages become less useful, it
>> encourages a culture of not even bothering to write meaningful messages at
>> all.
>>
>> Note that is trivial for a committer-reviewer to edit the history any way
>> they like without the button. "Allow edits by maintainers" is on by
>> default. The "Squash and merge" button just adds a button for something we
>> can already do.
>>
>> Charles: super useful! Worth noting that for a PR with a good history it
>> will skip meaningful commits (but still give the summary line, which is
>> nice).
>>
>> Kenn
>>
>>
>>
>> On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin  wrote:
>>
>>> Is there an actual problem caused by squashing or not squashing the
>>> commits that we face in the project? I personally have never needed to
>>> revert something complicated that would be problematic either way (and
>>> don't have a strong opinion about which way we should do it). From what I
>>> see so far in the thread it doesn't look like reverting is a frequent major
>>> pain for anyone. Maybe it is exactly because we're mostly following some
>>> best practice and it makes it easy. If someone has concrete examples from
>>> their experience in the project, please share them, this way it would be
>>> easier to justify the choice.
>>>
>>> The PR and commit cleanliness, size and isolation are probably more
>>> important thing to have guidance and maybe enforcement for. There are well
>>> known practices and guidelines that I think we should follow, and I think
>>> they will make squashing or not squashing mostly irrelevant. For example,
>>> if we accept that commits should have description that actually describes
>>> what commit does, then "!fixup", "address comments" and similar should not
>>> be part of the history and should be squashed before submitting the PR no
>>> matter which way we decide to go in general. Also, I think that making
>>> commits isolated is also a good practice, and PR author should be able to
>>> relatively easily split the PR upon reviewer's request. And if we choose to
>>> keep whole PRs small and incremental with descriptive isolated commits,
>>> then there won't be too much difference how many commits there are.
>>>
>>> Regards,
>>> Anton
>>>
>>> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud 
>>> wrote:
>>>
 I brought up this 

Re: Python typing library is not provisional in Python 3.7

2018-09-28 Thread Valentyn Tymofieiev
Hi Manu,

I second what Ahmet said - thanks for the pointers. Python 3.7 support can
come later down the road.

Thanks,
Valentyn

On Fri, Sep 28, 2018 at 11:17 AM Ahmet Altay  wrote:

> Hi Manu,
>
> Currently, we use Python 3.5.2 on Jenkins for testing. Python tests print
> out the python version in the console logs and I found this information
> from one of the logs [1]. Initial proposal for the Python 3 support was to
> support a specific version of python 3 during the porting process and later
> on work to add support for additional versions [2]. (Also note that, Python
> 3 was released about 3 months ago and after the porting effort started.)
>
> Hope this helps.
>
> Ahmet
>
> [1]
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/6114/consoleFull
> [2]
> https://lists.apache.org/thread.html/5371469de567357b1431606f766217ef73a9098dc45046f51a6ecceb@%3Cdev.beam.apache.org%3E
>
> On Thu, Sep 27, 2018 at 10:09 PM, Manu Zhang 
> wrote:
>
>> Hi Valentyn,
>>
>> I'm aware there is Python 3 environment and have worked on the options
>>  module. Yes, I'd love to
>> contribute more.
>> The issue I raise here is specifically about Python 3.7, where the
>> dependency on typing library would fail all the tests.
>> Do you know which version of Python 3 is setup for our tests ?
>>
>> Manu
>>
>> On Fri, Sep 28, 2018 at 8:02 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> Hi Manu,
>>>
>>> We have added Python 3 environment to our tests see [1], and we are
>>> actively making changes to Beam code to make it Python 3-compatible. We are
>>> enabling tests module by module, although we have to disable some of the
>>> tests initially, when failures are likely introduced in other modules.
>>>
>>> I think @RobbeSneyders is currently working on typehints package
>>> specifically, as per our Kanban board [2].
>>>
>>> If you (or anyone else) is interested in helping with Python 3 support,
>>> and has cycles to actively work on it now, please reach out - I would be
>>> happy to coordinate the effort, and help with code reviews.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/5d298db4c20bbb8876a5b75142341332c1e3fb8d/sdks/python/tox.ini#L56
>>> [2]
>>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail
>>>
>>>
>>> On Thu, Sep 27, 2018 at 3:52 PM Manu Zhang 
>>> wrote:
>>>
 Hi all,

 I failed to run Python tests in 3.7 with the following error.

   File
 "/Users/doria/git/incubator-beam/sdks/python/apache_beam/typehints/native_type_compatibility.py",
 line 23, in 

 import typing

   File
 "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
 line 1356, in 

 class Callable(extra=collections_abc.Callable,
 metaclass=CallableMeta):

   File
 "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
 line 1004, in __new__

 self._abc_registry = extra._abc_registry

 AttributeError: type object 'Callable' has no attribute '_abc_registry'

 This is because the required typing library is not provisional in
 Python 3.7 .

 Any thoughts on this? Shall we add Python 3.7 environment to our tests ?

 Thanks,
 Manu Zhang

>>>
>


Re: Agenda for the Beam Summit London 2018

2018-09-28 Thread Danny Angus


How exciting, can't wait to join you guys on Monday!
:-)
D.

On 2018/09/27 22:03:16, Griselda Cuevas  wrote: 
> Hi Beam Community,
> 
> We have finalized the agenda for the Beam Summit London 2018, it's here:
> https://www.linkedin.com/feed/update/urn:li:activity:6450125487321735168/
> 
> 
> We had a great amount of talk proposals, thank you so much to everyone who
> submitted one! We also sold out the event, so we're very excited to see the
> community growing.
> 
> 
> See you around,
> 
> Gris on behalf of the Organizing Committee
> 


Re: Why not adding all coders into ModelCoderRegistrar?

2018-09-28 Thread Shen Li
Thank you, Lukasz!

Best,
Shen

On Fri, Sep 28, 2018 at 2:11 PM Lukasz Cwik  wrote:

> Runners can never know about every coder that a user may want to write
> which is why we need to have a mechanism for Runners to be able to convert
> any unknown coder to one it can handle. This is done via
> WireCoders.instantiateRunnerWireCoder but this modifies the original coder
> which is why WireCoders.addSdkWireCoder creates the proto definition that
> the SDK should be told to use. In your case, your correct in that KV T> becomes KVCoder,
> LengthPrefixCoder> on the runner side and on the SDK side
> it should be KVCoder,
> LengthPrefixCoder>. More details in [1].
>
> 1:
> http://doc/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA#heading=h.sh4d5klmtfis
>
>
>
> On Fri, Sep 28, 2018 at 11:02 AM Shen Li  wrote:
>
>> Hi,
>>
>> I noticed that ModelCoderRegistrar only includes 9 out of ~40 coders. May
>> I know the rationale behind this decision?
>>
>>
>> https://github.com/apache/beam/blob/release-2.7.0/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java
>>
>> I think one consequence of the above configuration is
>> that WireCoders.instantiateRunnerWireCoder cannot instantiate KV coders
>> correctly, where VoidCoder (key coder) becomes
>> LengthPrefixCoder(ByteArrayCoder). What is the appropriate way to get
>> KvCoder from RunnerApi.Pipeline?
>>
>> Thanks,
>> Shen
>>
>


Re: Python typing library is not provisional in Python 3.7

2018-09-28 Thread Ahmet Altay
Hi Manu,

Currently, we use Python 3.5.2 on Jenkins for testing. Python tests print
out the python version in the console logs and I found this information
from one of the logs [1]. Initial proposal for the Python 3 support was to
support a specific version of python 3 during the porting process and later
on work to add support for additional versions [2]. (Also note that, Python
3 was released about 3 months ago and after the porting effort started.)

Hope this helps.

Ahmet

[1]
https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/6114/consoleFull
[2]
https://lists.apache.org/thread.html/5371469de567357b1431606f766217ef73a9098dc45046f51a6ecceb@%3Cdev.beam.apache.org%3E

On Thu, Sep 27, 2018 at 10:09 PM, Manu Zhang 
wrote:

> Hi Valentyn,
>
> I'm aware there is Python 3 environment and have worked on the options
>  module. Yes, I'd love to
> contribute more.
> The issue I raise here is specifically about Python 3.7, where the
> dependency on typing library would fail all the tests.
> Do you know which version of Python 3 is setup for our tests ?
>
> Manu
>
> On Fri, Sep 28, 2018 at 8:02 AM Valentyn Tymofieiev 
> wrote:
>
>> Hi Manu,
>>
>> We have added Python 3 environment to our tests see [1], and we are
>> actively making changes to Beam code to make it Python 3-compatible. We are
>> enabling tests module by module, although we have to disable some of the
>> tests initially, when failures are likely introduced in other modules.
>>
>> I think @RobbeSneyders is currently working on typehints package
>> specifically, as per our Kanban board [2].
>>
>> If you (or anyone else) is interested in helping with Python 3 support,
>> and has cycles to actively work on it now, please reach out - I would be
>> happy to coordinate the effort, and help with code reviews.
>>
>> Thanks,
>> Valentyn
>>
>> [1]  https://github.com/apache/beam/blob/5d298db4c20bbb8876a5b751423413
>> 32c1e3fb8d/sdks/python/tox.ini#L56
>> [2]  https://issues.apache.org/jira/secure/RapidBoard.jspa?
>> rapidView=245=detail
>>
>> On Thu, Sep 27, 2018 at 3:52 PM Manu Zhang 
>> wrote:
>>
>>> Hi all,
>>>
>>> I failed to run Python tests in 3.7 with the following error.
>>>
>>>   File "/Users/doria/git/incubator-beam/sdks/python/apache_beam/
>>> typehints/native_type_compatibility.py", line 23, in 
>>>
>>> import typing
>>>
>>>   File 
>>> "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
>>> line 1356, in 
>>>
>>> class Callable(extra=collections_abc.Callable,
>>> metaclass=CallableMeta):
>>>
>>>   File 
>>> "/Users/doria/git/incubator-beam/sdks/python/.eggs/typing-3.6.6-py3.7.egg/typing.py",
>>> line 1004, in __new__
>>>
>>> self._abc_registry = extra._abc_registry
>>>
>>> AttributeError: type object 'Callable' has no attribute '_abc_registry'
>>>
>>> This is because the required typing library is not provisional in
>>> Python 3.7 .
>>>
>>> Any thoughts on this? Shall we add Python 3.7 environment to our tests ?
>>>
>>> Thanks,
>>> Manu Zhang
>>>
>>


Re: Why not adding all coders into ModelCoderRegistrar?

2018-09-28 Thread Lukasz Cwik
Runners can never know about every coder that a user may want to write
which is why we need to have a mechanism for Runners to be able to convert
any unknown coder to one it can handle. This is done via
WireCoders.instantiateRunnerWireCoder but this modifies the original coder
which is why WireCoders.addSdkWireCoder creates the proto definition that
the SDK should be told to use. In your case, your correct in that KV becomes KVCoder,
LengthPrefixCoder> on the runner side and on the SDK side
it should be KVCoder,
LengthPrefixCoder>. More details in [1].

1:
http://doc/1IGduUqmhWDi_69l9nG8kw73HZ5WI5wOps9Tshl5wpQA#heading=h.sh4d5klmtfis



On Fri, Sep 28, 2018 at 11:02 AM Shen Li  wrote:

> Hi,
>
> I noticed that ModelCoderRegistrar only includes 9 out of ~40 coders. May
> I know the rationale behind this decision?
>
>
> https://github.com/apache/beam/blob/release-2.7.0/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java
>
> I think one consequence of the above configuration is
> that WireCoders.instantiateRunnerWireCoder cannot instantiate KV coders
> correctly, where VoidCoder (key coder) becomes
> LengthPrefixCoder(ByteArrayCoder). What is the appropriate way to get
> KvCoder from RunnerApi.Pipeline?
>
> Thanks,
> Shen
>


Why not adding all coders into ModelCoderRegistrar?

2018-09-28 Thread Shen Li
Hi,

I noticed that ModelCoderRegistrar only includes 9 out of ~40 coders. May I
know the rationale behind this decision?

https://github.com/apache/beam/blob/release-2.7.0/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ModelCoderRegistrar.java

I think one consequence of the above configuration is
that WireCoders.instantiateRunnerWireCoder cannot instantiate KV coders
correctly, where VoidCoder (key coder) becomes
LengthPrefixCoder(ByteArrayCoder). What is the appropriate way to get
KvCoder from RunnerApi.Pipeline?

Thanks,
Shen


Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Thomas Weise
+1 for stating the goal of clear provenance and granular rollback.

I think this discussion helps to remind/identify best practices how to get
there. Where appropriate we should augment guidelines for both, contributor
and committer.

Kenn: would you elaborate on the "1 commit = 1 review (and sometimes even =
1 ticket)" a bit more. Is that a problem of insufficient task / ticket
granularity or something else?

Charles: my plan is to translate the outcome of this discussion to
guideline updates and your log filter trick will be part of it. Though my
hope is that it won't be needed if we get closer to the goals above.

Thanks,
Thomas


On Fri, Sep 28, 2018 at 9:44 AM Kenneth Knowles  wrote:

> Anton makes a good point. We have been talking about policy for what we
> do, but the real issue is what we want to come out of it: a clear history
> for seeing where code came from and granular rollback. I think in both
> cases the key thing is that each commit is a single clear change. How they
> get there is not the point.
>
> I have worked on multiple projects with a 1 commit = 1 review (and
> sometimes even = 1 ticket). These pretty much never have a good history.
> The best case is that each commit has a message that is a bullet point of
> many separate changes, because it is simply too inefficient to review each
> logical change separately. But since the messages become less useful, it
> encourages a culture of not even bothering to write meaningful messages at
> all.
>
> Note that is trivial for a committer-reviewer to edit the history any way
> they like without the button. "Allow edits by maintainers" is on by
> default. The "Squash and merge" button just adds a button for something we
> can already do.
>
> Charles: super useful! Worth noting that for a PR with a good history it
> will skip meaningful commits (but still give the summary line, which is
> nice).
>
> Kenn
>
>
>
> On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin  wrote:
>
>> Is there an actual problem caused by squashing or not squashing the
>> commits that we face in the project? I personally have never needed to
>> revert something complicated that would be problematic either way (and
>> don't have a strong opinion about which way we should do it). From what I
>> see so far in the thread it doesn't look like reverting is a frequent major
>> pain for anyone. Maybe it is exactly because we're mostly following some
>> best practice and it makes it easy. If someone has concrete examples from
>> their experience in the project, please share them, this way it would be
>> easier to justify the choice.
>>
>> The PR and commit cleanliness, size and isolation are probably more
>> important thing to have guidance and maybe enforcement for. There are well
>> known practices and guidelines that I think we should follow, and I think
>> they will make squashing or not squashing mostly irrelevant. For example,
>> if we accept that commits should have description that actually describes
>> what commit does, then "!fixup", "address comments" and similar should not
>> be part of the history and should be squashed before submitting the PR no
>> matter which way we decide to go in general. Also, I think that making
>> commits isolated is also a good practice, and PR author should be able to
>> relatively easily split the PR upon reviewer's request. And if we choose to
>> keep whole PRs small and incremental with descriptive isolated commits,
>> then there won't be too much difference how many commits there are.
>>
>> Regards,
>> Anton
>>
>> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud 
>> wrote:
>>
>>> I brought up this discussion a few months ago from the other side: I
>>> don't like my commits being squashed. I try to create logical commits that
>>> each passes tests and could be broken up into multiple PRs. Keeping those
>>> changes intact is useful from a history perspective and squashing may break
>>> other PRs I have in flight. If the intent is clear (one commit with a
>>> descriptive message and a bunch of "fixups"), then feel free to squash,
>>> otherwise ask first. When you do squash, it would be good to leave a
>>> comment as to how the author can avoid having their commits squashed in the
>>> future.
>>>
>>>
>>> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>>>
>>> Andrew
>>>
>>> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
>>> wrote:
>>>


 On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
 wrote:

> I agree that we should create a good pointer for cleaning up PRs, and
> request (though not require) that authors do it. It's unfortunate though
> that squashing during a review makes things difficult to follow, so adds
> one more round trip.
>
> We could consider for those PRs that make sense as a single logical
> commit (most, but not all, of them) simply using the "squash and merge"
> button even though it technically doesn't 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Kenneth Knowles
Anton makes a good point. We have been talking about policy for what we do,
but the real issue is what we want to come out of it: a clear history for
seeing where code came from and granular rollback. I think in both cases
the key thing is that each commit is a single clear change. How they get
there is not the point.

I have worked on multiple projects with a 1 commit = 1 review (and
sometimes even = 1 ticket). These pretty much never have a good history.
The best case is that each commit has a message that is a bullet point of
many separate changes, because it is simply too inefficient to review each
logical change separately. But since the messages become less useful, it
encourages a culture of not even bothering to write meaningful messages at
all.

Note that is trivial for a committer-reviewer to edit the history any way
they like without the button. "Allow edits by maintainers" is on by
default. The "Squash and merge" button just adds a button for something we
can already do.

Charles: super useful! Worth noting that for a PR with a good history it
will skip meaningful commits (but still give the summary line, which is
nice).

Kenn



On Fri, Sep 28, 2018 at 8:54 AM Anton Kedin  wrote:

> Is there an actual problem caused by squashing or not squashing the
> commits that we face in the project? I personally have never needed to
> revert something complicated that would be problematic either way (and
> don't have a strong opinion about which way we should do it). From what I
> see so far in the thread it doesn't look like reverting is a frequent major
> pain for anyone. Maybe it is exactly because we're mostly following some
> best practice and it makes it easy. If someone has concrete examples from
> their experience in the project, please share them, this way it would be
> easier to justify the choice.
>
> The PR and commit cleanliness, size and isolation are probably more
> important thing to have guidance and maybe enforcement for. There are well
> known practices and guidelines that I think we should follow, and I think
> they will make squashing or not squashing mostly irrelevant. For example,
> if we accept that commits should have description that actually describes
> what commit does, then "!fixup", "address comments" and similar should not
> be part of the history and should be squashed before submitting the PR no
> matter which way we decide to go in general. Also, I think that making
> commits isolated is also a good practice, and PR author should be able to
> relatively easily split the PR upon reviewer's request. And if we choose to
> keep whole PRs small and incremental with descriptive isolated commits,
> then there won't be too much difference how many commits there are.
>
> Regards,
> Anton
>
> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud 
> wrote:
>
>> I brought up this discussion a few months ago from the other side: I
>> don't like my commits being squashed. I try to create logical commits that
>> each passes tests and could be broken up into multiple PRs. Keeping those
>> changes intact is useful from a history perspective and squashing may break
>> other PRs I have in flight. If the intent is clear (one commit with a
>> descriptive message and a bunch of "fixups"), then feel free to squash,
>> otherwise ask first. When you do squash, it would be good to leave a
>> comment as to how the author can avoid having their commits squashed in the
>> future.
>>
>>
>> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>>
>> Andrew
>>
>> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
>>> wrote:
>>>
 I agree that we should create a good pointer for cleaning up PRs, and
 request (though not require) that authors do it. It's unfortunate though
 that squashing during a review makes things difficult to follow, so adds
 one more round trip.

 We could consider for those PRs that make sense as a single logical
 commit (most, but not all, of them) simply using the "squash and merge"
 button even though it technically doesn't create a merge commit.

>>>
>>> +1 for allowing "squash and merge" as an option. Most of the reviews (at
>>> least for me) consist of a single valid commit and several additional
>>> commits that get piled up during the review process which obviously should
>>> not be included in the commit history. Going through another round here
>>> just to ask the author to fixup everything is unnecessarily time consuming.
>>>
>>> - Cham
>>>
>>>


 On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
 wrote:

> As a non-committer I think some automated squashing of commits sounds
> best since it lightens the load of regular contributors, by not having to
> always remember to squash, and lightens the load of committers so it
> doesn't take as long to have your PR approved by 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Charles Chen
+1 to Anton's points.  It looks like the main concern with unsquashed
commits is aesthetic, in that having "!fixup" commits produces noise and
clutters the code tree.  I would like to point out again for those unaware,
that "git log --first-parent" filters the commit history so that each PR
corresponds to one and only one commit (which would not show these fixups),
which seems to be the view that people are looking for.

On Fri, Sep 28, 2018 at 11:54 AM Anton Kedin  wrote:

> Is there an actual problem caused by squashing or not squashing the
> commits that we face in the project? I personally have never needed to
> revert something complicated that would be problematic either way (and
> don't have a strong opinion about which way we should do it). From what I
> see so far in the thread it doesn't look like reverting is a frequent major
> pain for anyone. Maybe it is exactly because we're mostly following some
> best practice and it makes it easy. If someone has concrete examples from
> their experience in the project, please share them, this way it would be
> easier to justify the choice.
>
> The PR and commit cleanliness, size and isolation are probably more
> important thing to have guidance and maybe enforcement for. There are well
> known practices and guidelines that I think we should follow, and I think
> they will make squashing or not squashing mostly irrelevant. For example,
> if we accept that commits should have description that actually describes
> what commit does, then "!fixup", "address comments" and similar should not
> be part of the history and should be squashed before submitting the PR no
> matter which way we decide to go in general. Also, I think that making
> commits isolated is also a good practice, and PR author should be able to
> relatively easily split the PR upon reviewer's request. And if we choose to
> keep whole PRs small and incremental with descriptive isolated commits,
> then there won't be too much difference how many commits there are.
>
> Regards,
> Anton
>
> On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud 
> wrote:
>
>> I brought up this discussion a few months ago from the other side: I
>> don't like my commits being squashed. I try to create logical commits that
>> each passes tests and could be broken up into multiple PRs. Keeping those
>> changes intact is useful from a history perspective and squashing may break
>> other PRs I have in flight. If the intent is clear (one commit with a
>> descriptive message and a bunch of "fixups"), then feel free to squash,
>> otherwise ask first. When you do squash, it would be good to leave a
>> comment as to how the author can avoid having their commits squashed in the
>> future.
>>
>>
>> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>>
>> Andrew
>>
>> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
>>> wrote:
>>>
 I agree that we should create a good pointer for cleaning up PRs, and
 request (though not require) that authors do it. It's unfortunate though
 that squashing during a review makes things difficult to follow, so adds
 one more round trip.

 We could consider for those PRs that make sense as a single logical
 commit (most, but not all, of them) simply using the "squash and merge"
 button even though it technically doesn't create a merge commit.

>>>
>>> +1 for allowing "squash and merge" as an option. Most of the reviews (at
>>> least for me) consist of a single valid commit and several additional
>>> commits that get piled up during the review process which obviously should
>>> not be included in the commit history. Going through another round here
>>> just to ask the author to fixup everything is unnecessarily time consuming.
>>>
>>> - Cham
>>>
>>>


 On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
 wrote:

> As a non-committer I think some automated squashing of commits sounds
> best since it lightens the load of regular contributors, by not having to
> always remember to squash, and lightens the load of committers so it
> doesn't take as long to have your PR approved by one.
>
> But for now I think the second best route would be making it PR
> author's responsibility to squash fixup commits. Having that expectation
> described clearly in the Contributor's Guide, along with some simple
> step-by-step instructions for how to do so should be enough. I mainly
> support this because I've been doing the squashing myself since I saw a
> thread about it here a few months ago. It's not nearly as huge a burden on
> me as it probably is for committers who have to merge in many more PRs,
> it's very easy to learn how to do, and it's one less barrier to having my
> code merged in.
>
> Of course I wouldn't expect that committers wait for PR authors to
> 

Re: Agenda for the Beam Summit London 2018

2018-09-28 Thread Rose Nguyen
Wow, this looks fantastic! Thanks to the organizers!

On Thu, Sep 27, 2018 at 11:29 PM Andrew Psaltis 
wrote:

> This is great. Any chance it will be recorded or at a minimum the slides
> made available after. Unfortunately, I won't be able to make it to London
> next week.
>
> Best,
> Andrew
>
> On Fri, Sep 28, 2018 at 10:11 AM Pablo Estrada  wrote:
>
>> Very exciting. I will have to miss it, but I'm excited to see what comes
>> out of it:)
>> Thanks to Gris, Matthias and other organizers.
>> Best
>> -P.
>>
>> On Thu, Sep 27, 2018, 4:26 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Great !! Thanks Gris.
>>>
>>> Looking forward to see you all next Monday in London.
>>>
>>> Regards
>>>
>>> JB
>>> Le 27 sept. 2018, à 18:03, Griselda Cuevas  a écrit:

 Hi Beam Community,

 We have finalized the agenda for the Beam Summit London 2018, it's
 here:
 https://www.linkedin.com/feed/update/urn:li:activity:6450125487321735168
 /


 We had a great amount of talk proposals, thank you so much to everyone
 who submitted one! We also sold out the event, so we're very excited to see
 the community growing.


 See you around,

 Gris on behalf of the Organizing Committee

>>>

-- 
Rose Thị Nguyễn


Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Anton Kedin
Is there an actual problem caused by squashing or not squashing the commits
that we face in the project? I personally have never needed to revert
something complicated that would be problematic either way (and don't have
a strong opinion about which way we should do it). From what I see so far
in the thread it doesn't look like reverting is a frequent major pain for
anyone. Maybe it is exactly because we're mostly following some best
practice and it makes it easy. If someone has concrete examples from their
experience in the project, please share them, this way it would be easier
to justify the choice.

The PR and commit cleanliness, size and isolation are probably more
important thing to have guidance and maybe enforcement for. There are well
known practices and guidelines that I think we should follow, and I think
they will make squashing or not squashing mostly irrelevant. For example,
if we accept that commits should have description that actually describes
what commit does, then "!fixup", "address comments" and similar should not
be part of the history and should be squashed before submitting the PR no
matter which way we decide to go in general. Also, I think that making
commits isolated is also a good practice, and PR author should be able to
relatively easily split the PR upon reviewer's request. And if we choose to
keep whole PRs small and incremental with descriptive isolated commits,
then there won't be too much difference how many commits there are.

Regards,
Anton

On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud  wrote:

> I brought up this discussion a few months ago from the other side: I don't
> like my commits being squashed. I try to create logical commits that each
> passes tests and could be broken up into multiple PRs. Keeping those
> changes intact is useful from a history perspective and squashing may break
> other PRs I have in flight. If the intent is clear (one commit with a
> descriptive message and a bunch of "fixups"), then feel free to squash,
> otherwise ask first. When you do squash, it would be good to leave a
> comment as to how the author can avoid having their commits squashed in the
> future.
>
>
> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>
> Andrew
>
> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
>> wrote:
>>
>>> I agree that we should create a good pointer for cleaning up PRs, and
>>> request (though not require) that authors do it. It's unfortunate though
>>> that squashing during a review makes things difficult to follow, so adds
>>> one more round trip.
>>>
>>> We could consider for those PRs that make sense as a single logical
>>> commit (most, but not all, of them) simply using the "squash and merge"
>>> button even though it technically doesn't create a merge commit.
>>>
>>
>> +1 for allowing "squash and merge" as an option. Most of the reviews (at
>> least for me) consist of a single valid commit and several additional
>> commits that get piled up during the review process which obviously should
>> not be included in the commit history. Going through another round here
>> just to ask the author to fixup everything is unnecessarily time consuming.
>>
>> - Cham
>>
>>
>>>
>>>
>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
>>> wrote:
>>>
 As a non-committer I think some automated squashing of commits sounds
 best since it lightens the load of regular contributors, by not having to
 always remember to squash, and lightens the load of committers so it
 doesn't take as long to have your PR approved by one.

 But for now I think the second best route would be making it PR
 author's responsibility to squash fixup commits. Having that expectation
 described clearly in the Contributor's Guide, along with some simple
 step-by-step instructions for how to do so should be enough. I mainly
 support this because I've been doing the squashing myself since I saw a
 thread about it here a few months ago. It's not nearly as huge a burden on
 me as it probably is for committers who have to merge in many more PRs,
 it's very easy to learn how to do, and it's one less barrier to having my
 code merged in.

 Of course I wouldn't expect that committers wait for PR authors to
 squash their fixup commits, but I think leaving a message like "For future
 pull requests you should squash any small fixup commits, as described here:
 " should be fine.


> I was also thinking about the possibility of wanting to revert
> individual commits from a merge commit. The solution you propose
> works,
> but only if you want to revert everything.


 Does this happen often? I might not have enough context since I'm not a
 committer, but it seems to me that often the person performing a revert is
 not the original author of a 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Thomas Weise
Thanks for linking the previous discussion.

I have also seen a few cases where the intention was to make individual
changes that can be applied independently. But why not create separate PRs
for those, so they can also be reviewed and merged independently?

Also, if the intention is to make independent, non-squashable commits,
these commits should be tagged appropriately.  That along with "edit
allowed by maintainers" could give sufficient indication to the reviewer.

Thomas



On Fri, Sep 28, 2018 at 8:21 AM Andrew Pilloud  wrote:

> I brought up this discussion a few months ago from the other side: I don't
> like my commits being squashed. I try to create logical commits that each
> passes tests and could be broken up into multiple PRs. Keeping those
> changes intact is useful from a history perspective and squashing may break
> other PRs I have in flight. If the intent is clear (one commit with a
> descriptive message and a bunch of "fixups"), then feel free to squash,
> otherwise ask first. When you do squash, it would be good to leave a
> comment as to how the author can avoid having their commits squashed in the
> future.
>
>
> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>
> Andrew
>
> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
>> wrote:
>>
>>> I agree that we should create a good pointer for cleaning up PRs, and
>>> request (though not require) that authors do it. It's unfortunate though
>>> that squashing during a review makes things difficult to follow, so adds
>>> one more round trip.
>>>
>>> We could consider for those PRs that make sense as a single logical
>>> commit (most, but not all, of them) simply using the "squash and merge"
>>> button even though it technically doesn't create a merge commit.
>>>
>>
>> +1 for allowing "squash and merge" as an option. Most of the reviews (at
>> least for me) consist of a single valid commit and several additional
>> commits that get piled up during the review process which obviously should
>> not be included in the commit history. Going through another round here
>> just to ask the author to fixup everything is unnecessarily time consuming.
>>
>> - Cham
>>
>>
>>>
>>>
>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
>>> wrote:
>>>
 As a non-committer I think some automated squashing of commits sounds
 best since it lightens the load of regular contributors, by not having to
 always remember to squash, and lightens the load of committers so it
 doesn't take as long to have your PR approved by one.

 But for now I think the second best route would be making it PR
 author's responsibility to squash fixup commits. Having that expectation
 described clearly in the Contributor's Guide, along with some simple
 step-by-step instructions for how to do so should be enough. I mainly
 support this because I've been doing the squashing myself since I saw a
 thread about it here a few months ago. It's not nearly as huge a burden on
 me as it probably is for committers who have to merge in many more PRs,
 it's very easy to learn how to do, and it's one less barrier to having my
 code merged in.

 Of course I wouldn't expect that committers wait for PR authors to
 squash their fixup commits, but I think leaving a message like "For future
 pull requests you should squash any small fixup commits, as described here:
 " should be fine.


> I was also thinking about the possibility of wanting to revert
> individual commits from a merge commit. The solution you propose
> works,
> but only if you want to revert everything.


 Does this happen often? I might not have enough context since I'm not a
 committer, but it seems to me that often the person performing a revert is
 not the original author of a change and doesn't have the context or time to
 pick out an individual commit to revert.

 On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels 
 wrote:

> I tend to agree with you Lukasz. Of course we should try to follow the
> guide lines as much as possible but if it requires an extra back and
> forth with the PR author for a cosmetic change, it may not be worth
> the
> time.
>
> On 19.09.18 22:17, Lukasz Cwik wrote:
> > I have to say I'm guilty of not following the merge guidelines,
> > sometimes doing merges without rebasing/flatten commits.
> >
> > I find that it is a few extra mins of my time to fix someones PR
> history
> > if they have more then one logical commit they want to be separate
> and
> > it usually takes days for the PR author to do merging  with the
> extra
> > burden as a committer to keep track of another PR and its state
> (waiting
> > for clean-up) is taxing. I really liked the idea of the 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Robert Bradshaw
Fully agree, if there is a logical commit history, we should keep it. I
think this is speaking to the large number of PRs that have a single "real"
commit, a bunch of fixups, and specifically authors who haven't gone
through and cleaned up their history.

(Now if the commits could logically be broken up into separate PRs, well,
maybe they should be, but that's a separate discussion. There are
definitely times where multiple commits make sense as a single PR too.)

On Fri, Sep 28, 2018 at 5:21 PM Andrew Pilloud  wrote:

> I brought up this discussion a few months ago from the other side: I don't
> like my commits being squashed. I try to create logical commits that each
> passes tests and could be broken up into multiple PRs. Keeping those
> changes intact is useful from a history perspective and squashing may break
> other PRs I have in flight. If the intent is clear (one commit with a
> descriptive message and a bunch of "fixups"), then feel free to squash,
> otherwise ask first. When you do squash, it would be good to leave a
> comment as to how the author can avoid having their commits squashed in the
> future.
>
>
> https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E
>
> Andrew
>
> On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
>> wrote:
>>
>>> I agree that we should create a good pointer for cleaning up PRs, and
>>> request (though not require) that authors do it. It's unfortunate though
>>> that squashing during a review makes things difficult to follow, so adds
>>> one more round trip.
>>>
>>> We could consider for those PRs that make sense as a single logical
>>> commit (most, but not all, of them) simply using the "squash and merge"
>>> button even though it technically doesn't create a merge commit.
>>>
>>
>> +1 for allowing "squash and merge" as an option. Most of the reviews (at
>> least for me) consist of a single valid commit and several additional
>> commits that get piled up during the review process which obviously should
>> not be included in the commit history. Going through another round here
>> just to ask the author to fixup everything is unnecessarily time consuming.
>>
>> - Cham
>>
>>
>>>
>>>
>>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
>>> wrote:
>>>
 As a non-committer I think some automated squashing of commits sounds
 best since it lightens the load of regular contributors, by not having to
 always remember to squash, and lightens the load of committers so it
 doesn't take as long to have your PR approved by one.

 But for now I think the second best route would be making it PR
 author's responsibility to squash fixup commits. Having that expectation
 described clearly in the Contributor's Guide, along with some simple
 step-by-step instructions for how to do so should be enough. I mainly
 support this because I've been doing the squashing myself since I saw a
 thread about it here a few months ago. It's not nearly as huge a burden on
 me as it probably is for committers who have to merge in many more PRs,
 it's very easy to learn how to do, and it's one less barrier to having my
 code merged in.

 Of course I wouldn't expect that committers wait for PR authors to
 squash their fixup commits, but I think leaving a message like "For future
 pull requests you should squash any small fixup commits, as described here:
 " should be fine.


> I was also thinking about the possibility of wanting to revert
> individual commits from a merge commit. The solution you propose
> works,
> but only if you want to revert everything.


 Does this happen often? I might not have enough context since I'm not a
 committer, but it seems to me that often the person performing a revert is
 not the original author of a change and doesn't have the context or time to
 pick out an individual commit to revert.

 On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels 
 wrote:

> I tend to agree with you Lukasz. Of course we should try to follow the
> guide lines as much as possible but if it requires an extra back and
> forth with the PR author for a cosmetic change, it may not be worth
> the
> time.
>
> On 19.09.18 22:17, Lukasz Cwik wrote:
> > I have to say I'm guilty of not following the merge guidelines,
> > sometimes doing merges without rebasing/flatten commits.
> >
> > I find that it is a few extra mins of my time to fix someones PR
> history
> > if they have more then one logical commit they want to be separate
> and
> > it usually takes days for the PR author to do merging  with the
> extra
> > burden as a committer to keep track of another PR and its state
> (waiting
> > for clean-up) is taxing. I really liked the idea of the mergebot
> 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Andrew Pilloud
I brought up this discussion a few months ago from the other side: I don't
like my commits being squashed. I try to create logical commits that each
passes tests and could be broken up into multiple PRs. Keeping those
changes intact is useful from a history perspective and squashing may break
other PRs I have in flight. If the intent is clear (one commit with a
descriptive message and a bunch of "fixups"), then feel free to squash,
otherwise ask first. When you do squash, it would be good to leave a
comment as to how the author can avoid having their commits squashed in the
future.

https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E

Andrew

On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath 
wrote:

>
>
> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw 
> wrote:
>
>> I agree that we should create a good pointer for cleaning up PRs, and
>> request (though not require) that authors do it. It's unfortunate though
>> that squashing during a review makes things difficult to follow, so adds
>> one more round trip.
>>
>> We could consider for those PRs that make sense as a single logical
>> commit (most, but not all, of them) simply using the "squash and merge"
>> button even though it technically doesn't create a merge commit.
>>
>
> +1 for allowing "squash and merge" as an option. Most of the reviews (at
> least for me) consist of a single valid commit and several additional
> commits that get piled up during the review process which obviously should
> not be included in the commit history. Going through another round here
> just to ask the author to fixup everything is unnecessarily time consuming.
>
> - Cham
>
>
>>
>>
>> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
>> wrote:
>>
>>> As a non-committer I think some automated squashing of commits sounds
>>> best since it lightens the load of regular contributors, by not having to
>>> always remember to squash, and lightens the load of committers so it
>>> doesn't take as long to have your PR approved by one.
>>>
>>> But for now I think the second best route would be making it PR author's
>>> responsibility to squash fixup commits. Having that expectation described
>>> clearly in the Contributor's Guide, along with some simple step-by-step
>>> instructions for how to do so should be enough. I mainly support this
>>> because I've been doing the squashing myself since I saw a thread about it
>>> here a few months ago. It's not nearly as huge a burden on me as it
>>> probably is for committers who have to merge in many more PRs, it's very
>>> easy to learn how to do, and it's one less barrier to having my code merged
>>> in.
>>>
>>> Of course I wouldn't expect that committers wait for PR authors to
>>> squash their fixup commits, but I think leaving a message like "For future
>>> pull requests you should squash any small fixup commits, as described here:
>>> " should be fine.
>>>
>>>
 I was also thinking about the possibility of wanting to revert
 individual commits from a merge commit. The solution you propose works,
 but only if you want to revert everything.
>>>
>>>
>>> Does this happen often? I might not have enough context since I'm not a
>>> committer, but it seems to me that often the person performing a revert is
>>> not the original author of a change and doesn't have the context or time to
>>> pick out an individual commit to revert.
>>>
>>> On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels 
>>> wrote:
>>>
 I tend to agree with you Lukasz. Of course we should try to follow the
 guide lines as much as possible but if it requires an extra back and
 forth with the PR author for a cosmetic change, it may not be worth the
 time.

 On 19.09.18 22:17, Lukasz Cwik wrote:
 > I have to say I'm guilty of not following the merge guidelines,
 > sometimes doing merges without rebasing/flatten commits.
 >
 > I find that it is a few extra mins of my time to fix someones PR
 history
 > if they have more then one logical commit they want to be separate
 and
 > it usually takes days for the PR author to do merging  with the extra
 > burden as a committer to keep track of another PR and its state
 (waiting
 > for clean-up) is taxing. I really liked the idea of the mergebot
 (even
 > though it didn't work out in practice) because it could do all the
 > policy work on my behalf.
 >
 > Anything that reduces my overhead as a committer is useful as for the
 > 100s of PRs that I have merged, I've only had to rollback a couple so
 > I'm for Charle's suggestion which makes the rollback flow slightly
 more
 > complicated for a significantly easier PR merge workflow.
 >
 > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen >>> > > wrote:
 >
 > What I mean is that if you get the first-parent commit using "git
 > log --first-parent", it will incorporate 

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Chamikara Jayalath
On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw  wrote:

> I agree that we should create a good pointer for cleaning up PRs, and
> request (though not require) that authors do it. It's unfortunate though
> that squashing during a review makes things difficult to follow, so adds
> one more round trip.
>
> We could consider for those PRs that make sense as a single logical commit
> (most, but not all, of them) simply using the "squash and merge" button
> even though it technically doesn't create a merge commit.
>

+1 for allowing "squash and merge" as an option. Most of the reviews (at
least for me) consist of a single valid commit and several additional
commits that get piled up during the review process which obviously should
not be included in the commit history. Going through another round here
just to ask the author to fixup everything is unnecessarily time consuming.

- Cham


>
>
> On Fri, Sep 21, 2018 at 9:24 PM Daniel Oliveira 
> wrote:
>
>> As a non-committer I think some automated squashing of commits sounds
>> best since it lightens the load of regular contributors, by not having to
>> always remember to squash, and lightens the load of committers so it
>> doesn't take as long to have your PR approved by one.
>>
>> But for now I think the second best route would be making it PR author's
>> responsibility to squash fixup commits. Having that expectation described
>> clearly in the Contributor's Guide, along with some simple step-by-step
>> instructions for how to do so should be enough. I mainly support this
>> because I've been doing the squashing myself since I saw a thread about it
>> here a few months ago. It's not nearly as huge a burden on me as it
>> probably is for committers who have to merge in many more PRs, it's very
>> easy to learn how to do, and it's one less barrier to having my code merged
>> in.
>>
>> Of course I wouldn't expect that committers wait for PR authors to squash
>> their fixup commits, but I think leaving a message like "For future pull
>> requests you should squash any small fixup commits, as described here:
>> " should be fine.
>>
>>
>>> I was also thinking about the possibility of wanting to revert
>>> individual commits from a merge commit. The solution you propose works,
>>> but only if you want to revert everything.
>>
>>
>> Does this happen often? I might not have enough context since I'm not a
>> committer, but it seems to me that often the person performing a revert is
>> not the original author of a change and doesn't have the context or time to
>> pick out an individual commit to revert.
>>
>> On Wed, Sep 19, 2018 at 1:32 PM Maximilian Michels 
>> wrote:
>>
>>> I tend to agree with you Lukasz. Of course we should try to follow the
>>> guide lines as much as possible but if it requires an extra back and
>>> forth with the PR author for a cosmetic change, it may not be worth the
>>> time.
>>>
>>> On 19.09.18 22:17, Lukasz Cwik wrote:
>>> > I have to say I'm guilty of not following the merge guidelines,
>>> > sometimes doing merges without rebasing/flatten commits.
>>> >
>>> > I find that it is a few extra mins of my time to fix someones PR
>>> history
>>> > if they have more then one logical commit they want to be separate and
>>> > it usually takes days for the PR author to do merging  with the extra
>>> > burden as a committer to keep track of another PR and its state
>>> (waiting
>>> > for clean-up) is taxing. I really liked the idea of the mergebot (even
>>> > though it didn't work out in practice) because it could do all the
>>> > policy work on my behalf.
>>> >
>>> > Anything that reduces my overhead as a committer is useful as for the
>>> > 100s of PRs that I have merged, I've only had to rollback a couple so
>>> > I'm for Charle's suggestion which makes the rollback flow slightly
>>> more
>>> > complicated for a significantly easier PR merge workflow.
>>> >
>>> > On Wed, Sep 19, 2018 at 1:13 PM Charles Chen >> > > wrote:
>>> >
>>> > What I mean is that if you get the first-parent commit using "git
>>> > log --first-parent", it will incorporate any and all fix up commits
>>> > so we don't need to worry about missing any.
>>> >
>>> > On Wed, Sep 19, 2018, 1:07 PM Maximilian Michels >> > > wrote:
>>> >
>>> > Generally, +1 for isolated commits which are easy to revert.
>>> >
>>> >  > I don't think it's actually harder to roll back a set of
>>> > commits that are merged together.
>>> > I think Thomas was mainly concerned about "fixup" commits to
>>> > land in
>>> > master (as part of a merge). These indeed make reverting
>>> commits
>>> > more
>>> > difficult because you have to check whether you missed a
>>> "fixup".
>>> >
>>> >  > Ideally every commit should compile and pass tests though,
>>> right?
>>> >
>>> > That is definitely what we should strive for when doing a merge
>>> > against
>>> >

Re: Portable Flink runner: Generator source for testing

2018-09-28 Thread Łukasz Gajowy
Hi all,

thank you, Thomas, for starting this discussion and Pablo for sharing the
ideas. FWIW adding here, we discussed this in terms of Core Beam Transform
Load Tests that we are working on right now [1]. If generating synthetic
data will be possible for portable streaming pipelines, we could use it in
our work to test Python streaming scenarios.

[1] *https://s.apache.org/GVMa *

pt., 28 wrz 2018 o 08:18 Pablo Estrada  napisał(a):

> Hi Thomas, all,
> yes, this is quite important for testing, and in fact I'd think it's
> important to streamline the insertion of native sources from different
> runners to make the current runners more usable. But that's another topic.
>
> For generators of synthetic data, I had a couple ideas (and this will show
> my limited knowledge about Flink and Streaming, but oh well):
>
> - Flink experts: Is it possible to add a pure-Beam generator that will do
> something like: Impulse -> ParDo(generate multiple elements) -> Forced
> "Write" to Flink (e.g. something like a reshuffle), and then have Flink
> manage the parallelism for stages downstream from that?
>
> - If this is not possible, it may be worth writing some transform in Flink
> / other runners that can be plugged in by inserting a custom URN. In fact,
> it may be a good idea to streamline the insertion of native sources for
> each runner based on some sort of CustomURNTransform() ?
>
> I hope I did not butcher those explanations too badly...
> Best
> -P.
>
> On Thu, Sep 27, 2018, 5:55 PM Thomas Weise  wrote:
>
>> There were a few discussions how we can facilitate testing for portable
>> streaming pipelines with the Flink runner. The problem is that we currently
>> don't have streaming sources in the Python SDK.
>>
>> One way to support testing could be a generator that extends the idea of
>> Impulse to provide a Flink native trigger transform, optionally
>> parameterized with an interval and max count.
>>
>> Test pipelines could then follow the generator with a Map function that
>> creates whatever payloads are desirable.
>>
>> Thoughts?
>>
>> Thanks,
>> Thomas
>>
>>


Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-28 Thread Tim Robertson
Thanks for sharing those results.

The second set (executors at 20-30) look similar to what I would have
expected.
BEAM-5036 definitely plays a part here as the data is not moved on HDFS
efficiently (fix in PR awaiting review now [1]).

To give an idea of the impact, here are some numbers from my own tests.
Without knowing your code, I presume mine is similar to your filter (take
data, modify it, write data with no shuffle/group/join)

My environment: 10 node YARN CDH 5.12.2 cluster, rewriting a 1.5TB AvroIO
file (code here [2]) I observed:

  - Using Spark API: 35 minutes
  - Beam AvroIO (2.6.0): 1.7hrs
  - Beam AvroIO with the 5036 fix: 42 minutes

Related: I also anticipate that varying the spark.default.parallelism will
affect Beam runtime.

Thanks,
Tim


[1] https://github.com/apache/beam/pull/6289
[2] https://github.com/gbif/beam-perf/tree/master/avro-to-avro


On Fri, Sep 28, 2018 at 9:27 AM Robert Bradshaw  wrote:

> Something here on the Beam side is clearly linear in the input size, as if
> there's a bottleneck where were' not able to get any parallelization. Is
> the spark variant running in parallel?
>
> On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) 
> wrote:
>
>> Hi
>> I have completed my test.
>> 1. Spark parameter :
>> deploy-mode client
>> executor-memory 1g
>> num-executors 1
>> driver-memory 1g
>>
>> WordCount:
>>
>>
>>
>> 300MB
>>
>> 600MB
>>
>> 1.2G
>>
>> Spark
>>
>> 1min8s
>>
>> 1min11s
>>
>> 1min18s
>>
>> Beam
>>
>> 6.4min
>>
>> 11min
>>
>> 22min
>>
>>
>>
>> Filter:
>>
>>
>>
>> 300MB
>>
>> 600MB
>>
>> 1.2G
>>
>> Spark
>>
>> 1.2min
>>
>> 1.7min
>>
>> 2.8min
>>
>> Beam
>>
>> 2.7min
>>
>> 4.1min
>>
>> 5.7min
>>
>>
>>
>> GroupbyKey + sum
>>
>>
>>
>> 300MB
>>
>> 600MB
>>
>> 1.2G
>>
>> Spark
>>
>> 3.6min
>>
>>
>>
>>
>>
>> Beam
>>
>> Failed, executor oom
>>
>>
>>
>>
>>
>>
>>
>> Union
>>
>>
>>
>> 300MB
>>
>> 600MB
>>
>> 1.2G
>>
>> Spark
>>
>> 1.7min
>>
>> 2.6min
>>
>> 5.1min
>>
>> Beam
>>
>> 3.6min
>>
>> 6.2min
>>
>> 11min
>>
>>
>>
>> 2. Spark parameter :
>>
>> deploy-mode client
>>
>> executor-memory 1g
>>
>> driver-memory 1g
>>
>> spark.dynamicAllocation.enabledtrue
>>
>


Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #186

2018-09-28 Thread Apache Jenkins Server
See 


Changes:

[shaun] added avroio package

[shaun] updated read emits to support both string and custom type reflects

[shaun] added avro write support

[mergebot] [BEAM-5436] Improve docs for Go SDK

[amaliujia] [BEAM-5506] Add reference link.

[shaun] updated to be in-line with beam project specifications

[shaun] update package log prints

[shaun] added readavro example

[shaun] updated example package header

[shaun] removed output.avro file

[kedin] Fix Java11 Jira link

[daniel.o.programmer] [BEAM-5304] Adding ReferenceRunner Job Server Gradle 
subproject.

[robertwb] [BEAM-5270] Fix ToString coder to return bytes objects in Python 3.

[kevinsi] When getting display data from a runtime parameter, don't call get().

[kevinsi] Randomize the reduced splits in BigtableIO so that multiple workers 
may

[scott] [BEAM-5518] Ignore failing ssl validation of globenewswire (#6502)

[pablo] Updating Dataflow API protocol buffers

[altay] update dataflow container name

--
[...truncated 24.37 MB...]
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hadoop-input-format/2.8.0-SNAPSHOT/beam-sdks-java-io-hadoop-input-format-2.8.0-20180928.081245-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

46: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hbase:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180928.081256-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hbase/2.8.0-SNAPSHOT/beam-sdks-java-io-hbase-2.8.0-20180928.081256-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

47: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-hcatalog:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180928.081304-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-hcatalog/2.8.0-SNAPSHOT/beam-sdks-java-io-hcatalog-2.8.0-20180928.081304-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

48: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jdbc:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180928.081312-14.jar'.
  > Could not PUT 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jdbc/2.8.0-SNAPSHOT/beam-sdks-java-io-jdbc-2.8.0-20180928.081312-14.jar'.
 Received status code 401 from server: Unauthorized

* Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to 
get more log output. Run with --scan to get full insights.
==

49: Task failed with an exception.
---
* What went wrong:
Execution failed for task 
':beam-sdks-java-io-jms:publishMavenJavaPublicationToMavenRepository'.
> Failed to publish publication 'mavenJava' to repository 'maven'
   > Could not write to resource 
'https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-io-jms/2.8.0-SNAPSHOT/beam-sdks-java-io-jms-2.8.0-20180928.081319-14.jar'.
  > Could not PUT 

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-28 Thread Robert Bradshaw
Something here on the Beam side is clearly linear in the input size, as if
there's a bottleneck where were' not able to get any parallelization. Is
the spark variant running in parallel?

On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) 
wrote:

> Hi
> I have completed my test.
> 1. Spark parameter :
> deploy-mode client
> executor-memory 1g
> num-executors 1
> driver-memory 1g
>
> WordCount:
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1min8s
>
> 1min11s
>
> 1min18s
>
> Beam
>
> 6.4min
>
> 11min
>
> 22min
>
>
>
> Filter:
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1.2min
>
> 1.7min
>
> 2.8min
>
> Beam
>
> 2.7min
>
> 4.1min
>
> 5.7min
>
>
>
> GroupbyKey + sum
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 3.6min
>
>
>
>
>
> Beam
>
> Failed, executor oom
>
>
>
>
>
>
>
> Union
>
>
>
> 300MB
>
> 600MB
>
> 1.2G
>
> Spark
>
> 1.7min
>
> 2.6min
>
> 5.1min
>
> Beam
>
> 3.6min
>
> 6.2min
>
> 11min
>
>
>
> 2. Spark parameter :
>
> deploy-mode client
>
> executor-memory 1g
>
> driver-memory 1g
>
> spark.dynamicAllocation.enabledtrue
>


Re: Portable Flink runner: Generator source for testing

2018-09-28 Thread Pablo Estrada
Hi Thomas, all,
yes, this is quite important for testing, and in fact I'd think it's
important to streamline the insertion of native sources from different
runners to make the current runners more usable. But that's another topic.

For generators of synthetic data, I had a couple ideas (and this will show
my limited knowledge about Flink and Streaming, but oh well):

- Flink experts: Is it possible to add a pure-Beam generator that will do
something like: Impulse -> ParDo(generate multiple elements) -> Forced
"Write" to Flink (e.g. something like a reshuffle), and then have Flink
manage the parallelism for stages downstream from that?

- If this is not possible, it may be worth writing some transform in Flink
/ other runners that can be plugged in by inserting a custom URN. In fact,
it may be a good idea to streamline the insertion of native sources for
each runner based on some sort of CustomURNTransform() ?

I hope I did not butcher those explanations too badly...
Best
-P.

On Thu, Sep 27, 2018, 5:55 PM Thomas Weise  wrote:

> There were a few discussions how we can facilitate testing for portable
> streaming pipelines with the Flink runner. The problem is that we currently
> don't have streaming sources in the Python SDK.
>
> One way to support testing could be a generator that extends the idea of
> Impulse to provide a Flink native trigger transform, optionally
> parameterized with an interval and max count.
>
> Test pipelines could then follow the generator with a Map function that
> creates whatever payloads are desirable.
>
> Thoughts?
>
> Thanks,
> Thomas
>
>