Re: [ANNOUNCE] New committer: Yichi Zhang

2021-04-21 Thread Ahmet Altay
Congratulations Yichi! 

On Wed, Apr 21, 2021 at 6:48 PM Chamikara Jayalath 
wrote:

> Congrats Yichi!
>
> On Wed, Apr 21, 2021 at 6:14 PM Heejong Lee  wrote:
>
>> Congratulations :)
>>
>> On Wed, Apr 21, 2021 at 5:20 PM Tomo Suzuki  wrote:
>>
>>> Congratulations!
>>>
>>> On Wed, Apr 21, 2021 at 7:48 PM Tyson Hamilton 
>>> wrote:
>>>
 Congrats!

 On Wed, Apr 21, 2021 at 4:37 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Well deserved and congrats, Yichi!
>
> On Wed, Apr 21, 2021 at 4:23 PM Pablo Estrada 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Yichi Zhang
>>
>> Yichi has been working in Beam for a while. He has contributed to
>> various areas, including Nexmark tests, test health, Python's streaming
>> capabilities, he has answered questions on StackOverflow, and helped with
>> release validations, among many other things that Yichi has contributed 
>> to
>> the Beam community.
>>
>> Considering these contributions, the Beam PMC trusts Yichi with the
>> responsibilities of a Beam committer.[1]
>>
>> Thanks Yichi!
>> -P.
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>


Re: [ANNOUNCE] New committer: Yichi Zhang

2021-04-21 Thread Chamikara Jayalath
Congrats Yichi!

On Wed, Apr 21, 2021 at 6:14 PM Heejong Lee  wrote:

> Congratulations :)
>
> On Wed, Apr 21, 2021 at 5:20 PM Tomo Suzuki  wrote:
>
>> Congratulations!
>>
>> On Wed, Apr 21, 2021 at 7:48 PM Tyson Hamilton 
>> wrote:
>>
>>> Congrats!
>>>
>>> On Wed, Apr 21, 2021 at 4:37 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 Well deserved and congrats, Yichi!

 On Wed, Apr 21, 2021 at 4:23 PM Pablo Estrada 
 wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Yichi Zhang
>
> Yichi has been working in Beam for a while. He has contributed to
> various areas, including Nexmark tests, test health, Python's streaming
> capabilities, he has answered questions on StackOverflow, and helped with
> release validations, among many other things that Yichi has contributed to
> the Beam community.
>
> Considering these contributions, the Beam PMC trusts Yichi with the
> responsibilities of a Beam committer.[1]
>
> Thanks Yichi!
> -P.
>
> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>

>>
>> --
>> Regards,
>> Tomo
>>
>


Re: [ANNOUNCE] New committer: Yichi Zhang

2021-04-21 Thread Tyson Hamilton
Congrats!

On Wed, Apr 21, 2021 at 4:37 PM Valentyn Tymofieiev 
wrote:

> Well deserved and congrats, Yichi!
>
> On Wed, Apr 21, 2021 at 4:23 PM Pablo Estrada  wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new committer:
>> Yichi Zhang
>>
>> Yichi has been working in Beam for a while. He has contributed to various
>> areas, including Nexmark tests, test health, Python's streaming
>> capabilities, he has answered questions on StackOverflow, and helped with
>> release validations, among many other things that Yichi has contributed to
>> the Beam community.
>>
>> Considering these contributions, the Beam PMC trusts Yichi with the
>> responsibilities of a Beam committer.[1]
>>
>> Thanks Yichi!
>> -P.
>>
>> [1]
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>


Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-21 Thread Tomo Suzuki
Thank you for the preparation!

> a few responses that some high priority changes

Would you be willing to share the items for visibility?


On Wed, Apr 21, 2021 at 7:21 PM Kenneth Knowles  wrote:
>
> Also the 2.29.0 was re-cut.
>
> Usually a delay in one release should not delay the next release, because 
> each release represents a certain quantity of changes. But in this case, the 
> actual quantity of changes is affected by the re-cut, too.
>
> On Wed, Apr 21, 2021 at 4:12 PM Heejong Lee  wrote:
>>
>> Update on the 2.30.0 branch cut schedule:
>>
>> I'm thinking of delaying the branch cut a week since I've got a few 
>> responses that some high priority changes are still ongoing.
>>
>> The new cut date is April 28.
>>
>>
>> On Tue, Apr 20, 2021 at 6:07 PM Ahmet Altay  wrote:
>>>
>>> +1 and thank you!
>>>
>>> On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee  wrote:

 Hi All,

 Beam 2.30.0 release is scheduled to be cut on April 21 according to the 
 release calendar [1]

 I'd like to volunteer myself to be the release manager for this release. I 
 plan on cutting the release branch on the scheduled date.

 Any comments or objections ?

 Thanks,
 Heejong

 [1] 
 https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles



--
Regards,
Tomo


Re: [ANNOUNCE] New committer: Yichi Zhang

2021-04-21 Thread Valentyn Tymofieiev
Well deserved and congrats, Yichi!

On Wed, Apr 21, 2021 at 4:23 PM Pablo Estrada  wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer:
> Yichi Zhang
>
> Yichi has been working in Beam for a while. He has contributed to various
> areas, including Nexmark tests, test health, Python's streaming
> capabilities, he has answered questions on StackOverflow, and helped with
> release validations, among many other things that Yichi has contributed to
> the Beam community.
>
> Considering these contributions, the Beam PMC trusts Yichi with the
> responsibilities of a Beam committer.[1]
>
> Thanks Yichi!
> -P.
>
> [1]
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>


Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Robert Bradshaw
I am also in the camp that it often makes sense to have more than 1 commit
per PR, but rather than enforce a 1 commit per PR policy, I would say that
it's too much bother to look at the commit history whether it should be
squashed or merged (though I think it is almost always very obvious which
is preferable for a given PR), go ahead and squash rather than merge by
default.


On Wed, Apr 21, 2021 at 2:23 PM Kenneth Knowles  wrote:

> This seems to come up a lot. Maybe we should change something?
>
> Having worked on a number of projects and at companies with this policy,
> companies using non-distributed source control, and companies that just
> "use git like git", I know all these ways of life pretty well.
>
> TL;DR my experience is:
>  - when people care about the commit history and take care of it, then
> just "use git like git" results in faster development and clearer history,
> despite intermediate commits not being tested by Jenkins/Travis/GHA
>  - when people see git as an inconvenience, view the history as an
> implementation detail, or think in linear history of PR merges and view the
> commits as an implementation detail, it becomes a mess
>
> Empirically, this is what I expect from a 1 commit = 1 PR policy (and how
> I feel about each point):
>  - fewer commits with bad messages (yay!)
>  - simpler git graph if we squash + rebase (meh)
>  - larger commits of related-but-independent changes (could be OK)
>  - commits with bullet points in their description that bundle unrelated
> changes (sad face)
>  - slowdown of development (neutral - slow can be good)
>  - fewer "quality of life" improvements, since those would add lines of
> diff to a PR and are off topic; when they have to be done in a separate PR
> they don't get done and they don't get reviewed with the same priority
> (extra sad face)
>
> I know I am in the minority. I tend to have a lot of PRs where there
> are 2-5 fairly independent commits. It is "to aid code review" but not in
> the way you might think: The best size for code review is pretty big,
> compared to the best size for commit. A commit is the unit of roll-forward,
> roll-back, cherry-pick, etc. Brian's point about commits not being
> independently tested is important: this is a tooling issue, but not that
> easy to change. Here is why I am not that worried about it: I believe
> strongly in a "rollback first" policy to restore greenness, but also that
> the rollback change itself must be verified to restore greenness. When a
> multi-commit PR fails, you can easily open a revert of the whole PR as well
> as reverts of individual suspect commits. The CI for these will finish
> around the same time, and if you manage a smaller revert, great! Imagine if
> to revert a PR you had to revert _every_ change between HEAD and that PR.
> It would restore to a known green state. Yet we don't do this, because we
> have technology that makes it unnecessary. Ultimately, single large commits
> with bullet points are just an unstructured version of multi-commit PRs. So
> I favor the structure. But people seem to be more likely to write good
> bullet points than to write independent commits. Perhaps because it is
> easier.
>
> So at this point, I think I am OK with a 1 commit per PR policy. I think
> the net benefits to our commit history would be good. I have grown tired of
> repeating the conversation. Rebase-and-squash edits commit ids in ways that
> confuses tools, so I do not favor this. Tooling that merges one commit at a
> time (without altering commit id) would also be super cool and not that
> hard. It would prevent intermediate results from merging, solving both
> problems.
>
> Kenn
>
>
> On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette  wrote:
>
>> I'd argue that the history is almost always "most useful" when one PR ==
>> one commit on master. Intermediate commits from a PR may be useful to aid
>> code review, but they're not verified by presubmits and thus aren't
>> necessarily independently revertible, so I see little value in keeping them
>> around on master. In fact if you're breaking up a PR into multiple commits
>> to aid code review, it's worth considering if they could/should be
>> separately reviewed and verified PRs.
>> We could solve the unwanted commit issue if we have a policy to always
>> "Squash and Merge" PRs with rare exceptions.
>>
>> I agree jira/PR titles could be better, I'm not sure what we can do about
>> it aside from reminding committers of this responsibility. Perhaps the
>> triage process can help catch poorly titled jiras?
>>
>> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw 
>> wrote:
>>
>>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this
>>> up.
>>>
>>> For merging unwanted commits, can we automate a simple check (e.g. with
>>> github actions)?
>>>
>>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki  wrote:
>>>
 BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
 [1], I see I was not following this


[ANNOUNCE] New committer: Yichi Zhang

2021-04-21 Thread Pablo Estrada
Hi all,

Please join me and the rest of the Beam PMC in welcoming a new committer:
Yichi Zhang

Yichi has been working in Beam for a while. He has contributed to various
areas, including Nexmark tests, test health, Python's streaming
capabilities, he has answered questions on StackOverflow, and helped with
release validations, among many other things that Yichi has contributed to
the Beam community.

Considering these contributions, the Beam PMC trusts Yichi with the
responsibilities of a Beam committer.[1]

Thanks Yichi!
-P.

[1]
https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer


Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-21 Thread Kenneth Knowles
Also the 2.29.0 was re-cut.

Usually a delay in one release should not delay the next release, because
each release represents a certain quantity of changes. But in this case,
the actual quantity of changes is affected by the re-cut, too.

On Wed, Apr 21, 2021 at 4:12 PM Heejong Lee  wrote:

> Update on the 2.30.0 branch cut schedule:
>
> I'm thinking of delaying the branch cut a week since I've got a few
> responses that some high priority changes are still ongoing.
>
> The new cut date is April 28.
>
>
> On Tue, Apr 20, 2021 at 6:07 PM Ahmet Altay  wrote:
>
>> +1 and thank you!
>>
>> On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee  wrote:
>>
>>> Hi All,
>>>
>>> Beam 2.30.0 release is scheduled to be cut on April 21 according to the
>>> release calendar [1]
>>>
>>> I'd like to volunteer myself to be the release manager for this release.
>>> I plan on cutting the release branch on the scheduled date.
>>>
>>> Any comments or objections ?
>>>
>>> Thanks,
>>> Heejong
>>>
>>> [1]
>>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>>
>>


Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-21 Thread Heejong Lee
Update on the 2.30.0 branch cut schedule:

I'm thinking of delaying the branch cut a week since I've got a few
responses that some high priority changes are still ongoing.

The new cut date is April 28.


On Tue, Apr 20, 2021 at 6:07 PM Ahmet Altay  wrote:

> +1 and thank you!
>
> On Tue, Apr 20, 2021 at 4:55 PM Heejong Lee  wrote:
>
>> Hi All,
>>
>> Beam 2.30.0 release is scheduled to be cut on April 21 according to the
>> release calendar [1]
>>
>> I'd like to volunteer myself to be the release manager for this release.
>> I plan on cutting the release branch on the scheduled date.
>>
>> Any comments or objections ?
>>
>> Thanks,
>> Heejong
>>
>> [1]
>> https://calendar.google.com/calendar/u/0/embed?src=0p73sl034k80oob7seouani...@group.calendar.google.com=America/Los_Angeles
>>
>


Flaky test issue report

2021-04-21 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests. These are P1 issues 
because they have a major negative impact on the community and make it hard to 
determine the quality of the software.

BEAM-12200: SamzaStoreStateInternalsTest is flaky 
(https://issues.apache.org/jira/browse/BEAM-12200)
BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK 
harness startup (https://issues.apache.org/jira/browse/BEAM-12163)
BEAM-12061: beam_PostCommit_SQL failing on 
KafkaTableProviderIT.testFakeNested 
(https://issues.apache.org/jira/browse/BEAM-12061)
BEAM-12020: :sdks:java:container:java8:docker failing missing licenses 
(https://issues.apache.org/jira/browse/BEAM-12020)
BEAM-12019: 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky (https://issues.apache.org/jira/browse/BEAM-12019)
BEAM-11792: Python precommit failed (flaked?) installing package  
(https://issues.apache.org/jira/browse/BEAM-11792)
BEAM-11733: [beam_PostCommit_Java] [testFhirIO_Import|export] flaky 
(https://issues.apache.org/jira/browse/BEAM-11733)
BEAM-11666: 
apache_beam.runners.interactive.recording_manager_test.RecordingManagerTest.test_basic_execution
 is flaky (https://issues.apache.org/jira/browse/BEAM-11666)
BEAM-11662: elasticsearch tests failing 
(https://issues.apache.org/jira/browse/BEAM-11662)
BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) 
(https://issues.apache.org/jira/browse/BEAM-11661)
BEAM-11646: beam_PostCommit_XVR_Spark failing 
(https://issues.apache.org/jira/browse/BEAM-11646)
BEAM-11645: beam_PostCommit_XVR_Flink failing 
(https://issues.apache.org/jira/browse/BEAM-11645)
BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on 
direct runner. (https://issues.apache.org/jira/browse/BEAM-11541)
BEAM-11540: Linter sometimes flakes on apache_beam.dataframe.frames_test 
(https://issues.apache.org/jira/browse/BEAM-11540)
BEAM-11493: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyAndWindows
 (https://issues.apache.org/jira/browse/BEAM-11493)
BEAM-11492: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMergingWindows
 (https://issues.apache.org/jira/browse/BEAM-11492)
BEAM-11491: Spark test failure: 
org.apache.beam.sdk.transforms.GroupByKeyTest$WindowTests.testGroupByKeyMultipleWindows
 (https://issues.apache.org/jira/browse/BEAM-11491)
BEAM-11490: Spark test failure: 
org.apache.beam.sdk.transforms.ReifyTimestampsTest.inValuesSucceeds 
(https://issues.apache.org/jira/browse/BEAM-11490)
BEAM-11489: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-11489)
BEAM-11488: Spark test failure: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedCounterMetrics
 (https://issues.apache.org/jira/browse/BEAM-11488)
BEAM-11487: Spark test failure: 
org.apache.beam.sdk.transforms.WithTimestampsTest.withTimestampsShouldApplyTimestamps
 (https://issues.apache.org/jira/browse/BEAM-11487)
BEAM-11486: Spark test failure: 
org.apache.beam.sdk.testing.PAssertTest.testSerializablePredicate 
(https://issues.apache.org/jira/browse/BEAM-11486)
BEAM-11485: Spark test failure: 
org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineNullValues 
(https://issues.apache.org/jira/browse/BEAM-11485)
BEAM-11484: Spark test failure: 
org.apache.beam.runners.core.metrics.MetricsPusherTest.pushesUserMetrics 
(https://issues.apache.org/jira/browse/BEAM-11484)
BEAM-11483: Spark portable streaming PostCommit Test Improvements 
(https://issues.apache.org/jira/browse/BEAM-11483)
BEAM-10995: Java + Universal Local Runner: 
WindowingTest.testWindowPreservation fails 
(https://issues.apache.org/jira/browse/BEAM-10995)
BEAM-10987: stager_test.py::StagerTest::test_with_main_session flaky on 
windows py3.6,3.7 (https://issues.apache.org/jira/browse/BEAM-10987)
BEAM-10968: flaky test: 
org.apache.beam.sdk.metrics.MetricsTest$AttemptedMetricTests.testAttemptedDistributionMetrics
 (https://issues.apache.org/jira/browse/BEAM-10968)
BEAM-10955: Flink Java Runner test flake: Could not find Flink job  
(https://issues.apache.org/jira/browse/BEAM-10955)
BEAM-10923: Python requirements installation in docker container is flaky 
(https://issues.apache.org/jira/browse/BEAM-10923)
BEAM-10899: test_FhirIO_exportFhirResourcesGcs flake with OOM 
(https://issues.apache.org/jira/browse/BEAM-10899)
BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations 
flaky on macOS (https://issues.apache.org/jira/browse/BEAM-10866)
BEAM-10763: Spotless flake (NullPointerException) 
(https://issues.apache.org/jira/browse/BEAM-10763)
BEAM-10590: BigQueryQueryToTableIT flaky: test_big_query_new_types 

P1 issues report

2021-04-21 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the 
meaning and expectations around P1 issues.

BEAM-12205: Dataflow pipelines broken NoSuchMethodError 
DoFnInvoker.invokeSetup() (https://issues.apache.org/jira/browse/BEAM-12205)
BEAM-12195: Flink Runner 1.11 uses old Scala-Version 
(https://issues.apache.org/jira/browse/BEAM-12195)
BEAM-11959: Python Beam SDK Harness hangs when installing pip packages 
(https://issues.apache.org/jira/browse/BEAM-11959)
BEAM-11906: No trigger early repeatedly for session windows 
(https://issues.apache.org/jira/browse/BEAM-11906)
BEAM-11875: XmlIO.Read does not handle XML encoding per spec 
(https://issues.apache.org/jira/browse/BEAM-11875)
BEAM-11828: JmsIO is not acknowledging messages correctly 
(https://issues.apache.org/jira/browse/BEAM-11828)
BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly 
broken (at least on portable flink runner) 
(https://issues.apache.org/jira/browse/BEAM-11755)
BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int 
overflowing?) (https://issues.apache.org/jira/browse/BEAM-11578)
BEAM-11576: Go ValidatesRunner failure: TestFlattenDup on Dataflow Runner 
(https://issues.apache.org/jira/browse/BEAM-11576)
BEAM-11434: Expose Spanner admin/batch clients in Spanner Accessor 
(https://issues.apache.org/jira/browse/BEAM-11434)
BEAM-11227: Upgrade beam-vendor-grpc-1_26_0-0.3 to fix CVE-2020-27216 
(https://issues.apache.org/jira/browse/BEAM-11227)
BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink 
(https://issues.apache.org/jira/browse/BEAM-11148)
BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow 
runner) (https://issues.apache.org/jira/browse/BEAM-11017)
BEAM-10861: Adds URNs and payloads to PubSub transforms 
(https://issues.apache.org/jira/browse/BEAM-10861)
BEAM-10617: python CombineGlobally().with_fanout() cause duplicate combine 
results for sliding windows (https://issues.apache.org/jira/browse/BEAM-10617)
BEAM-10573: CSV files are loaded several times if they are too large 
(https://issues.apache.org/jira/browse/BEAM-10573)
BEAM-10569: SpannerIO tests don't actually assert anything. 
(https://issues.apache.org/jira/browse/BEAM-10569)
BEAM-10288: Quickstart documents are out of date 
(https://issues.apache.org/jira/browse/BEAM-10288)
BEAM-10244: Populate requirements cache fails on poetry-based packages 
(https://issues.apache.org/jira/browse/BEAM-10244)
BEAM-10100: FileIO writeDynamic with AvroIO.sink not writing all data 
(https://issues.apache.org/jira/browse/BEAM-10100)
BEAM-9564: Remove insecure ssl options from MongoDBIO 
(https://issues.apache.org/jira/browse/BEAM-9564)
BEAM-9455: Environment-sensitive provisioning for Dataflow 
(https://issues.apache.org/jira/browse/BEAM-9455)
BEAM-9293: Python direct runner doesn't emit empty pane when it should 
(https://issues.apache.org/jira/browse/BEAM-9293)
BEAM-8986: SortValues may not work correct for numerical types 
(https://issues.apache.org/jira/browse/BEAM-8986)
BEAM-8985: SortValues should fail if SecondaryKey coder is not 
deterministic (https://issues.apache.org/jira/browse/BEAM-8985)
BEAM-8407: [SQL] Some Hive tests throw NullPointerException, but get marked 
as passing (Direct Runner) (https://issues.apache.org/jira/browse/BEAM-8407)
BEAM-7717: PubsubIO watermark tracking hovers near start of epoch 
(https://issues.apache.org/jira/browse/BEAM-7717)
BEAM-7716: PubsubIO returns empty message bodies for all messages read 
(https://issues.apache.org/jira/browse/BEAM-7716)
BEAM-7195: BigQuery - 404 errors for 'table not found' when using dynamic 
destinations - sometimes, new table fails to get created 
(https://issues.apache.org/jira/browse/BEAM-7195)
BEAM-6839: User reports protobuf ClassChangeError running against 2.6.0 or 
above (https://issues.apache.org/jira/browse/BEAM-6839)
BEAM-6466: KafkaIO doesn't commit offsets while being used as bounded 
source (https://issues.apache.org/jira/browse/BEAM-6466)


Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Kenneth Knowles
This seems to come up a lot. Maybe we should change something?

Having worked on a number of projects and at companies with this policy,
companies using non-distributed source control, and companies that just
"use git like git", I know all these ways of life pretty well.

TL;DR my experience is:
 - when people care about the commit history and take care of it, then just
"use git like git" results in faster development and clearer history,
despite intermediate commits not being tested by Jenkins/Travis/GHA
 - when people see git as an inconvenience, view the history as an
implementation detail, or think in linear history of PR merges and view the
commits as an implementation detail, it becomes a mess

Empirically, this is what I expect from a 1 commit = 1 PR policy (and how I
feel about each point):
 - fewer commits with bad messages (yay!)
 - simpler git graph if we squash + rebase (meh)
 - larger commits of related-but-independent changes (could be OK)
 - commits with bullet points in their description that bundle unrelated
changes (sad face)
 - slowdown of development (neutral - slow can be good)
 - fewer "quality of life" improvements, since those would add lines of
diff to a PR and are off topic; when they have to be done in a separate PR
they don't get done and they don't get reviewed with the same priority
(extra sad face)

I know I am in the minority. I tend to have a lot of PRs where there
are 2-5 fairly independent commits. It is "to aid code review" but not in
the way you might think: The best size for code review is pretty big,
compared to the best size for commit. A commit is the unit of roll-forward,
roll-back, cherry-pick, etc. Brian's point about commits not being
independently tested is important: this is a tooling issue, but not that
easy to change. Here is why I am not that worried about it: I believe
strongly in a "rollback first" policy to restore greenness, but also that
the rollback change itself must be verified to restore greenness. When a
multi-commit PR fails, you can easily open a revert of the whole PR as well
as reverts of individual suspect commits. The CI for these will finish
around the same time, and if you manage a smaller revert, great! Imagine if
to revert a PR you had to revert _every_ change between HEAD and that PR.
It would restore to a known green state. Yet we don't do this, because we
have technology that makes it unnecessary. Ultimately, single large commits
with bullet points are just an unstructured version of multi-commit PRs. So
I favor the structure. But people seem to be more likely to write good
bullet points than to write independent commits. Perhaps because it is
easier.

So at this point, I think I am OK with a 1 commit per PR policy. I think
the net benefits to our commit history would be good. I have grown tired of
repeating the conversation. Rebase-and-squash edits commit ids in ways that
confuses tools, so I do not favor this. Tooling that merges one commit at a
time (without altering commit id) would also be super cool and not that
hard. It would prevent intermediate results from merging, solving both
problems.

Kenn


On Wed, Apr 21, 2021 at 1:25 PM Brian Hulette  wrote:

> I'd argue that the history is almost always "most useful" when one PR ==
> one commit on master. Intermediate commits from a PR may be useful to aid
> code review, but they're not verified by presubmits and thus aren't
> necessarily independently revertible, so I see little value in keeping them
> around on master. In fact if you're breaking up a PR into multiple commits
> to aid code review, it's worth considering if they could/should be
> separately reviewed and verified PRs.
> We could solve the unwanted commit issue if we have a policy to always
> "Squash and Merge" PRs with rare exceptions.
>
> I agree jira/PR titles could be better, I'm not sure what we can do about
> it aside from reminding committers of this responsibility. Perhaps the
> triage process can help catch poorly titled jiras?
>
> On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw 
> wrote:
>
>> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this
>> up.
>>
>> For merging unwanted commits, can we automate a simple check (e.g. with
>> github actions)?
>>
>> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki  wrote:
>>
>>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
>>> [1], I see I was not following this
>>>
>>> > The reviewer should give the LGTM and then request that the author of
>>> the pull request rebase, squash, split, etc, the commits, so that the
>>> history is most useful
>>>
>>>
>>> Thank you for the feedback on this matter! (And I don't think we
>>> should change the contribution guide)
>>>
>>> [1] https://beam.apache.org/contribute/committer-guide/
>>>
>>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía  wrote:
>>> >
>>> > Hello,
>>> >
>>> > I have noticed an ongoing pattern of carelessness around issues/PR
>>> titles and
>>> > descriptions. It is really painful 

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Brian Hulette
I'd argue that the history is almost always "most useful" when one PR ==
one commit on master. Intermediate commits from a PR may be useful to aid
code review, but they're not verified by presubmits and thus aren't
necessarily independently revertible, so I see little value in keeping them
around on master. In fact if you're breaking up a PR into multiple commits
to aid code review, it's worth considering if they could/should be
separately reviewed and verified PRs.
We could solve the unwanted commit issue if we have a policy to always
"Squash and Merge" PRs with rare exceptions.

I agree jira/PR titles could be better, I'm not sure what we can do about
it aside from reminding committers of this responsibility. Perhaps the
triage process can help catch poorly titled jiras?

On Wed, Apr 21, 2021 at 11:38 AM Robert Bradshaw 
wrote:

> +1 to better descriptions for JIRA (and PRs). Thanks for bringing this up.
>
> For merging unwanted commits, can we automate a simple check (e.g. with
> github actions)?
>
> On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki  wrote:
>
>> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
>> [1], I see I was not following this
>>
>> > The reviewer should give the LGTM and then request that the author of
>> the pull request rebase, squash, split, etc, the commits, so that the
>> history is most useful
>>
>>
>> Thank you for the feedback on this matter! (And I don't think we
>> should change the contribution guide)
>>
>> [1] https://beam.apache.org/contribute/committer-guide/
>>
>> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía  wrote:
>> >
>> > Hello,
>> >
>> > I have noticed an ongoing pattern of carelessness around issues/PR
>> titles and
>> > descriptions. It is really painful to see more and more examples like:
>> >
>> > BEAM-12160 Add TODO for fixing the warning
>> > BEAM-12165 Fix ParquetIO
>> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
>> > toMinutes (commit)
>> >
>> > In all these cases with just a bit of detail in the title it would be
>> enough to
>> > make other contributors or reviewers life easierm as well as to have a
>> better
>> > project history.  What astonishes me apart of the lack of care is that
>> some of
>> > those are from Beam commmitters.
>> >
>> > We already have discussed about not paying attention during commit
>> merges where
>> > some PRs end up merging tons of 'unwanted' fixup commits, and nothing
>> has
>> > changed so I am wondering if we should maybe just totally remove that
>> rule (for
>> > commits) and also eventually for titles and descriptions.
>> >
>> > Ismaël
>> >
>> > [1] https://beam.apache.org/contribute/
>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>


Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-21 Thread Kenneth Knowles
Since the artifacts were changed about 26 hours ago, I intend to leave this
vote open until 46 hours from now. Specifically, around noon my time (US
Pacific) on Friday I will close the vote and finalize the release, if no
problems are discovered.

Kenn

On Wed, Apr 21, 2021 at 12:52 PM Kenneth Knowles  wrote:

> +1 (binding)
>
> I ran the script at
> https://beam.apache.org/contribute/release-guide/#run-validations-using-run_rc_validationsh
> except for the part that requires a GitHub PR, since Cham already did that
> part.
>
> Kenn
>
> On Wed, Apr 21, 2021 at 12:11 PM Valentyn Tymofieiev 
> wrote:
>
>> +1, verified that my previous findings are fixed.
>>
>> On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Ran some Python scenarios and updated the spreadsheet.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles  wrote:
>>>


 On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw 
 wrote:

> The artifacts and signatures look good to me. +1 (binding)
>
> (The release branch still has the .dev name, maybe you didn't push?
> https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
> )
>

 Good point. I'll highlight that I finally implemented the branching
 changes from
 https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E

 The new guide with diagram is here:
 https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc

 TL;DR:
  - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the
 main branch is now dev/SNAPSHOT for 2.30.0
  - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a
 single tagged commit that removes the dev/SNAPSHOT suffix

 Kenn


> On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles 
> wrote:
>
>> Please take another look.
>>
>>  - I re-ran the RC creation script so the source release and wheels
>> are new and built from the RC tag. I confirmed the source zip and wheels
>> have version 2.29.0 (not .dev or -SNAPSHOT).
>>  - I fixed and rebuilt Dataflow worker container images from exactly
>> the RC commit, added dataclasses, with internal changes to get the 
>> version
>> to match.
>>  - I confirmed that the staged jars already have version 2.29.0 (not
>> -SNAPSHOT).
>>  - I confirmed with `diff -r -q` that the source tarball matches the
>> RC tag (minus the .git* files and directories and gradlew)
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles 
>> wrote:
>>
>>> At this point, the release train has just about come around to
>>> 2.30.0 which will pick up that change. I don't think it makes sense to
>>> cherry-pick anything more into 2.29.0 unless it is nonfunctional. As it 
>>> is,
>>> I think we have a good commit and just need to build the expected
>>> artifacts. Since it isn't all the artifacts, I was planning on just
>>> overwriting the RC1 artifacts in question and re-verify. I could also 
>>> roll
>>> a new RC2 from the same commit fairly easily.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>>>
 Any chance we could include
 https://github.com/apache/beam/pull/14548?

 On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles 
 wrote:

> To clarify: I am running and fixing the release scripts on the
> `master` branch. They work from fresh clones of the RC tag so this 
> should
> work in most cases. The exception is the GitHub Actions configuration,
> which I cherrypicked
> to the release branch.
>
> Kenn
>
> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
> wrote:
>
>> OK it sounds like I need to re-roll the artifacts in question. I
>> don't think anything raised here indicates a problem with the tagged
>> commit, but with the state of the release scripts at the time I 
>> built the
>> earlier artifacts.
>>
>> On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>>
>>> It looks like the wheels are also versioned "2.29.0.dev".
>>>
>>> Not sure if it's important, but the source tarball also seems to
>>> contain some release script changes that are not reflected in the 
>>> github
>>> branch.
>>>
>>> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
>>> wrote:
>>>
 Thanks for the details, Valentyn & Cham. I will fix the
 Dataflow worker containers then update this thread.

 Kenn

 On Mon, Apr 19, 2021 at 

Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-21 Thread Kenneth Knowles
+1 (binding)

I ran the script at
https://beam.apache.org/contribute/release-guide/#run-validations-using-run_rc_validationsh
except for the part that requires a GitHub PR, since Cham already did that
part.

Kenn

On Wed, Apr 21, 2021 at 12:11 PM Valentyn Tymofieiev 
wrote:

> +1, verified that my previous findings are fixed.
>
> On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath 
> wrote:
>
>> +1 (binding)
>>
>> Ran some Python scenarios and updated the spreadsheet.
>>
>> Thanks,
>> Cham
>>
>> On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles  wrote:
>>
>>>
>>>
>>> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw 
>>> wrote:
>>>
 The artifacts and signatures look good to me. +1 (binding)

 (The release branch still has the .dev name, maybe you didn't push?
 https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
 )

>>>
>>> Good point. I'll highlight that I finally implemented the branching
>>> changes from
>>> https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E
>>>
>>> The new guide with diagram is here:
>>> https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc
>>>
>>> TL;DR:
>>>  - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the
>>> main branch is now dev/SNAPSHOT for 2.30.0
>>>  - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a
>>> single tagged commit that removes the dev/SNAPSHOT suffix
>>>
>>> Kenn
>>>
>>>
 On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles 
 wrote:

> Please take another look.
>
>  - I re-ran the RC creation script so the source release and wheels
> are new and built from the RC tag. I confirmed the source zip and wheels
> have version 2.29.0 (not .dev or -SNAPSHOT).
>  - I fixed and rebuilt Dataflow worker container images from exactly
> the RC commit, added dataclasses, with internal changes to get the version
> to match.
>  - I confirmed that the staged jars already have version 2.29.0 (not
> -SNAPSHOT).
>  - I confirmed with `diff -r -q` that the source tarball matches the
> RC tag (minus the .git* files and directories and gradlew)
>
> Kenn
>
> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles 
> wrote:
>
>> At this point, the release train has just about come around to 2.30.0
>> which will pick up that change. I don't think it makes sense to 
>> cherry-pick
>> anything more into 2.29.0 unless it is nonfunctional. As it is, I think 
>> we
>> have a good commit and just need to build the expected artifacts. Since 
>> it
>> isn't all the artifacts, I was planning on just overwriting the RC1
>> artifacts in question and re-verify. I could also roll a new RC2 from the
>> same commit fairly easily.
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>>
>>> Any chance we could include
>>> https://github.com/apache/beam/pull/14548?
>>>
>>> On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles 
>>> wrote:
>>>
 To clarify: I am running and fixing the release scripts on the
 `master` branch. They work from fresh clones of the RC tag so this 
 should
 work in most cases. The exception is the GitHub Actions configuration,
 which I cherrypicked
 to the release branch.

 Kenn

 On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
 wrote:

> OK it sounds like I need to re-roll the artifacts in question. I
> don't think anything raised here indicates a problem with the tagged
> commit, but with the state of the release scripts at the time I built 
> the
> earlier artifacts.
>
> On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw <
> rober...@google.com> wrote:
>
>> It looks like the wheels are also versioned "2.29.0.dev".
>>
>> Not sure if it's important, but the source tarball also seems to
>> contain some release script changes that are not reflected in the 
>> github
>> branch.
>>
>> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
>> wrote:
>>
>>> Thanks for the details, Valentyn & Cham. I will fix the Dataflow
>>> worker containers then update this thread.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
>>> wrote:
>>>


 On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
 elh...@ibiblio.org> wrote:

> On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles <
> k...@apache.org> wrote:
>
> > The complete staging area is available for your review,
> which includes:
> > * JIRA release 

Re: Performance tests dashboard not working

2021-04-21 Thread Ismaël Mejía
Seems to be a networking issue in my side, they fail on Firefox for
some weird timeout but they work perfectly on Chrome.
Thanks for confirming Andrew

On Wed, Apr 21, 2021 at 6:45 PM Andrew Pilloud  wrote:
>
> Looks like it is working now?
>
> On Wed, Apr 21, 2021 at 7:34 AM Ismaël Mejía  wrote:
>>
>> Following the conversation on the performance regression on Flink
>> runner I wanted to take a look at the performance dashboards (Nexmark
>> + Load Tests) but when I open the dashboards it says there is a
>> connectivity error "NetworkError when attempting to fetch resource.".
>> Can someone with more knowledge about our CI / dashboards infra please
>> take a look.
>>
>> http://104.154.241.245/d/ahudA_zGz/nexmark?orgId=1


Re: Flink runner configuration for closure cleaner

2021-04-21 Thread Raman Gupta
On Wed, Apr 14, 2021 at 3:21 PM Kyle Weaver  wrote:

> Looking at the code, `--flink-conf-dir` is only read when executing the
>> runner directly. When running via the pipeline, there seems to be no way to
>> set that value.
>
>
> You're right, --flink-conf-dir is not usable in all situations. Can you
> try setting the FLINK_CONF_DIR environment variable to the directory
> containing your flink.conf instead?
>
>
That did work, thank you.


>
>> On Tue, Apr 6, 2021 at 5:39 PM Kyle Weaver  wrote:
>>
>>> I don't think this will require Beam to have its own configuration
>>> option. You should be able to set the property
>>> "pipeline.closure-cleaner-level" in your flink.conf and then pass it to
>>> Beam using Beam's "--flink-conf-dir" pipeline option.
>>>
>>> On Tue, Apr 6, 2021 at 2:28 PM Raman Gupta 
>>> wrote:
>>>
 Hello all: I created https://issues.apache.org/jira/browse/BEAM-12055
 because I'm having an issue with using the Flink runner locally, due to
 https://issues.apache.org/jira/browse/FLINK-15773.

 Does anyone see any reason why Beam's Flink runner should not provide a
 configuration option that can disable the Flink closure cleaner? I'm not
 familiar with this myself, and the only reason I would use it is as a
 work-around for FLINK-15773 in my local dev environment.

 Regards,
 Raman




Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-21 Thread Valentyn Tymofieiev
+1, verified that my previous findings are fixed.

On Wed, Apr 21, 2021 at 8:17 AM Chamikara Jayalath 
wrote:

> +1 (binding)
>
> Ran some Python scenarios and updated the spreadsheet.
>
> Thanks,
> Cham
>
> On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw 
>> wrote:
>>
>>> The artifacts and signatures look good to me. +1 (binding)
>>>
>>> (The release branch still has the .dev name, maybe you didn't push?
>>> https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
>>> )
>>>
>>
>> Good point. I'll highlight that I finally implemented the branching
>> changes from
>> https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E
>>
>> The new guide with diagram is here:
>> https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc
>>
>> TL;DR:
>>  - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the
>> main branch is now dev/SNAPSHOT for 2.30.0
>>  - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a
>> single tagged commit that removes the dev/SNAPSHOT suffix
>>
>> Kenn
>>
>>
>>> On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles 
>>> wrote:
>>>
 Please take another look.

  - I re-ran the RC creation script so the source release and wheels are
 new and built from the RC tag. I confirmed the source zip and wheels have
 version 2.29.0 (not .dev or -SNAPSHOT).
  - I fixed and rebuilt Dataflow worker container images from exactly
 the RC commit, added dataclasses, with internal changes to get the version
 to match.
  - I confirmed that the staged jars already have version 2.29.0 (not
 -SNAPSHOT).
  - I confirmed with `diff -r -q` that the source tarball matches the RC
 tag (minus the .git* files and directories and gradlew)

 Kenn

 On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles 
 wrote:

> At this point, the release train has just about come around to 2.30.0
> which will pick up that change. I don't think it makes sense to 
> cherry-pick
> anything more into 2.29.0 unless it is nonfunctional. As it is, I think we
> have a good commit and just need to build the expected artifacts. Since it
> isn't all the artifacts, I was planning on just overwriting the RC1
> artifacts in question and re-verify. I could also roll a new RC2 from the
> same commit fairly easily.
>
> Kenn
>
> On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:
>
>> Any chance we could include https://github.com/apache/beam/pull/14548
>> ?
>>
>> On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles 
>> wrote:
>>
>>> To clarify: I am running and fixing the release scripts on the
>>> `master` branch. They work from fresh clones of the RC tag so this 
>>> should
>>> work in most cases. The exception is the GitHub Actions configuration,
>>> which I cherrypicked
>>> to the release branch.
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
>>> wrote:
>>>
 OK it sounds like I need to re-roll the artifacts in question. I
 don't think anything raised here indicates a problem with the tagged
 commit, but with the state of the release scripts at the time I built 
 the
 earlier artifacts.

 On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw <
 rober...@google.com> wrote:

> It looks like the wheels are also versioned "2.29.0.dev".
>
> Not sure if it's important, but the source tarball also seems to
> contain some release script changes that are not reflected in the 
> github
> branch.
>
> On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
> wrote:
>
>> Thanks for the details, Valentyn & Cham. I will fix the Dataflow
>> worker containers then update this thread.
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
>> wrote:
>>
>>>
>>>
>>> On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
>>> elh...@ibiblio.org> wrote:
>>>
 On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles <
 k...@apache.org> wrote:

 > The complete staging area is available for your review, which
 includes:
 > * JIRA release notes [1],
 > * the official Apache source release to be deployed to
 dist.apache.org [2], which is signed with the key with
 fingerprint 03DBA3E6ABDD04BFD1558DC16ED551A8AE02461C [3],
 > * all artifacts to be deployed to the Maven Central
 Repository [4],
 > * source code tag "v2.29.0-RC1" [5],
 > * website pull request listing the release 

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Robert Bradshaw
+1 to better descriptions for JIRA (and PRs). Thanks for bringing this up.

For merging unwanted commits, can we automate a simple check (e.g. with
github actions)?

On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki  wrote:

> BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
> [1], I see I was not following this
>
> > The reviewer should give the LGTM and then request that the author of
> the pull request rebase, squash, split, etc, the commits, so that the
> history is most useful
>
>
> Thank you for the feedback on this matter! (And I don't think we
> should change the contribution guide)
>
> [1] https://beam.apache.org/contribute/committer-guide/
>
> On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía  wrote:
> >
> > Hello,
> >
> > I have noticed an ongoing pattern of carelessness around issues/PR
> titles and
> > descriptions. It is really painful to see more and more examples like:
> >
> > BEAM-12160 Add TODO for fixing the warning
> > BEAM-12165 Fix ParquetIO
> > BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
> > toMinutes (commit)
> >
> > In all these cases with just a bit of detail in the title it would be
> enough to
> > make other contributors or reviewers life easierm as well as to have a
> better
> > project history.  What astonishes me apart of the lack of care is that
> some of
> > those are from Beam commmitters.
> >
> > We already have discussed about not paying attention during commit
> merges where
> > some PRs end up merging tons of 'unwanted' fixup commits, and nothing has
> > changed so I am wondering if we should maybe just totally remove that
> rule (for
> > commits) and also eventually for titles and descriptions.
> >
> > Ismaël
> >
> > [1] https://beam.apache.org/contribute/
>
>
>
> --
> Regards,
> Tomo
>


Re: Question about transformOverride

2021-04-21 Thread Xinyu Liu
Yes, our recommendation to the users is the same as you described, passing
in the inputs as parameters. This fits many use cases. We also have users
wrapping IOs in their own composite transforms and/or having inputs
scattered in different modules/libraries. Passing in inputs doesn't work
well in these use cases, so we are thinking about whether we can provide a
way to override the input transform. From the discussion above, it seems
the current TransformOverride is not intended for this usage. I will take a
closer look at the code to see whether this is achievable (and see whether
we need more dependency from the core). Seems in portable pipeline Fuse can
manipulate the structure of the pipeline, but I am not sure about the java
pipeline.

Thanks,
Xinyu

On Wed, Apr 21, 2021 at 9:40 AM Robert Burke  wrote:

> My general answer for this is to avoid bundling the IOs with the rest of
> the pipeline. Have the Input collection be a parameter to a function that
> constructs the rest of the pipeline, which returns its intended
> PCollections as outputs.
>
> No need to go as far as wrap the whole construction function as a
> Composite, but that's similar.
>
> Runners providing features to make it easier to test the way you describe,
> though does sound very useful, but it does require the runner be aware of
> each transform to be overridden, possibly increasing the runners dependency
> surface.
>
> On Wed, Apr 21, 2021, 9:31 AM Xinyu Liu  wrote:
>
>> @Chamikara: Yuhong and I are working on Samza Runner, and we are looking
>> for a way to swap the transform for ease of use in testing.
>>
>> @Reuven: Your approach will work for this case, but we do have quite a
>> few read transforms here and we have to plug this code in each of time with
>> some testing logic there too. Seems not very clean to me to have testing
>> code mixed with real logic. It will be hard to maintain in the long run if
>> we add more read transforms in the future. It will be much nicer if we can
>> leverage something like TransformOverrides to replace a transform entirely
>> without messing around the existing code.
>>
>> Thanks,
>> Xinyu
>>
>> On Tue, Apr 20, 2021 at 10:00 PM Boyuan Zhang  wrote:
>>
>>> +1 to use pipeline options.
>>>
>>>  Alternatively, you can also change your KafkaReadTransform to perform
>>> different expansion(override expand()) based on your pipeline options.
>>>
>>> On Tue, Apr 20, 2021 at 9:51 PM Reuven Lax  wrote:
>>>
 It would be simpler to create a custom pipeline option, and swap out
 the read transform in your code. For example

 PCollection pc;
 if (options.getLocalTest()) {
   pc = pipeline.apply(new ReadFromLocalFile());
 } else {
   pc = pipeline.apply(new KafkaReadTrasnform());
 }

 pc.apply(/* rest of pipeline */);

 On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
 wrote:

> We want to support transform override when doing tests locally.  For
> example, in real pipelines, we read from Kafka, but when doing tests
> locally, we want to read from a local file to help test whether the
> pipeline works fine. So we want to override the Kafka read transform
> directly instead of writing the pipeline twice.
>
> code example:
>
> public Pipeline createPipeline(Pipeline pipeline) {
>
>pipeline.apply(new KafkaReadTransform()).apply(// other
> functions..);
> }
> In test, we will use the same createPipeline() function to create a
> pipeline but meanwhile we want to override KafkaReadTransform with another
> transform to avoid reading from Kafka.
>
> Thanks,
> Yuhong
>
>
>
>
>
>
>
>
>
> On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> In general, TransformOverrides are expected to be per-runner
>> implementation details and are not expected to be directly used by
>> end-users.
>> What is the exact use-case you are trying to achieve ? Are you
>> running into a missing feature of an existing transform ?
>>
>> Thanks,
>> Cham
>>
>> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng <
>> mabelyuhong0...@gmail.com> wrote:
>>
>>> Hi Beam,
>>> We have a use case when creating a pipeline, we want to replace the
>>> IO read/write transform when testing using 
>>> `pipeline.replaceAll(overrides)`.
>>> However, we met some problems when doing tests:
>>> 1. Are there any ways we can avoid calling expand() of a transform
>>> when it is going to be replaced?  The reason we want to override a
>>> transform is because that the expand() of this transform is somehow not
>>> available in some situations. It seems not reasonable enough to call the
>>> expand() of the originalTransform and then call the expand() of the
>>> overrideTransform again?
>>> 2. When trying to implement `PTransformOverrideFactory`, we realize
>>> that 

Re: Performance tests dashboard not working

2021-04-21 Thread Andrew Pilloud
Looks like it is working now?

On Wed, Apr 21, 2021 at 7:34 AM Ismaël Mejía  wrote:

> Following the conversation on the performance regression on Flink
> runner I wanted to take a look at the performance dashboards (Nexmark
> + Load Tests) but when I open the dashboards it says there is a
> connectivity error "NetworkError when attempting to fetch resource.".
> Can someone with more knowledge about our CI / dashboards infra please
> take a look.
>
> http://104.154.241.245/d/ahudA_zGz/nexmark?orgId=1
>


Re: Question about transformOverride

2021-04-21 Thread Robert Burke
My general answer for this is to avoid bundling the IOs with the rest of
the pipeline. Have the Input collection be a parameter to a function that
constructs the rest of the pipeline, which returns its intended
PCollections as outputs.

No need to go as far as wrap the whole construction function as a
Composite, but that's similar.

Runners providing features to make it easier to test the way you describe,
though does sound very useful, but it does require the runner be aware of
each transform to be overridden, possibly increasing the runners dependency
surface.

On Wed, Apr 21, 2021, 9:31 AM Xinyu Liu  wrote:

> @Chamikara: Yuhong and I are working on Samza Runner, and we are looking
> for a way to swap the transform for ease of use in testing.
>
> @Reuven: Your approach will work for this case, but we do have quite a few
> read transforms here and we have to plug this code in each of time with
> some testing logic there too. Seems not very clean to me to have testing
> code mixed with real logic. It will be hard to maintain in the long run if
> we add more read transforms in the future. It will be much nicer if we can
> leverage something like TransformOverrides to replace a transform entirely
> without messing around the existing code.
>
> Thanks,
> Xinyu
>
> On Tue, Apr 20, 2021 at 10:00 PM Boyuan Zhang  wrote:
>
>> +1 to use pipeline options.
>>
>>  Alternatively, you can also change your KafkaReadTransform to perform
>> different expansion(override expand()) based on your pipeline options.
>>
>> On Tue, Apr 20, 2021 at 9:51 PM Reuven Lax  wrote:
>>
>>> It would be simpler to create a custom pipeline option, and swap out the
>>> read transform in your code. For example
>>>
>>> PCollection pc;
>>> if (options.getLocalTest()) {
>>>   pc = pipeline.apply(new ReadFromLocalFile());
>>> } else {
>>>   pc = pipeline.apply(new KafkaReadTrasnform());
>>> }
>>>
>>> pc.apply(/* rest of pipeline */);
>>>
>>> On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
>>> wrote:
>>>
 We want to support transform override when doing tests locally.  For
 example, in real pipelines, we read from Kafka, but when doing tests
 locally, we want to read from a local file to help test whether the
 pipeline works fine. So we want to override the Kafka read transform
 directly instead of writing the pipeline twice.

 code example:

 public Pipeline createPipeline(Pipeline pipeline) {

pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
 }
 In test, we will use the same createPipeline() function to create a
 pipeline but meanwhile we want to override KafkaReadTransform with another
 transform to avoid reading from Kafka.

 Thanks,
 Yuhong









 On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> In general, TransformOverrides are expected to be per-runner
> implementation details and are not expected to be directly used by
> end-users.
> What is the exact use-case you are trying to achieve ? Are you running
> into a missing feature of an existing transform ?
>
> Thanks,
> Cham
>
> On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng <
> mabelyuhong0...@gmail.com> wrote:
>
>> Hi Beam,
>> We have a use case when creating a pipeline, we want to replace the
>> IO read/write transform when testing using 
>> `pipeline.replaceAll(overrides)`.
>> However, we met some problems when doing tests:
>> 1. Are there any ways we can avoid calling expand() of a transform
>> when it is going to be replaced?  The reason we want to override a
>> transform is because that the expand() of this transform is somehow not
>> available in some situations. It seems not reasonable enough to call the
>> expand() of the originalTransform and then call the expand() of the
>> overrideTransform again?
>> 2. When trying to implement `PTransformOverrideFactory`, we realize
>> that the inputs are `TaggedPValue`, which can only make {Tuple,
>> PCollection} pairs. Then if we want to override a write transform whose
>> output type is `PDone`, what's the best way to implement this factory?
>>
>>
>> Thanks in advance for answers! This is quite important to our
>> pipelines.
>>
>> Thanks,
>> Yuhong
>>
>


Re: Question about transformOverride

2021-04-21 Thread Xinyu Liu
@Chamikara: Yuhong and I are working on Samza Runner, and we are looking
for a way to swap the transform for ease of use in testing.

@Reuven: Your approach will work for this case, but we do have quite a few
read transforms here and we have to plug this code in each of time with
some testing logic there too. Seems not very clean to me to have testing
code mixed with real logic. It will be hard to maintain in the long run if
we add more read transforms in the future. It will be much nicer if we can
leverage something like TransformOverrides to replace a transform entirely
without messing around the existing code.

Thanks,
Xinyu

On Tue, Apr 20, 2021 at 10:00 PM Boyuan Zhang  wrote:

> +1 to use pipeline options.
>
>  Alternatively, you can also change your KafkaReadTransform to perform
> different expansion(override expand()) based on your pipeline options.
>
> On Tue, Apr 20, 2021 at 9:51 PM Reuven Lax  wrote:
>
>> It would be simpler to create a custom pipeline option, and swap out the
>> read transform in your code. For example
>>
>> PCollection pc;
>> if (options.getLocalTest()) {
>>   pc = pipeline.apply(new ReadFromLocalFile());
>> } else {
>>   pc = pipeline.apply(new KafkaReadTrasnform());
>> }
>>
>> pc.apply(/* rest of pipeline */);
>>
>> On Tue, Apr 20, 2021 at 9:41 PM Yuhong Cheng 
>> wrote:
>>
>>> We want to support transform override when doing tests locally.  For
>>> example, in real pipelines, we read from Kafka, but when doing tests
>>> locally, we want to read from a local file to help test whether the
>>> pipeline works fine. So we want to override the Kafka read transform
>>> directly instead of writing the pipeline twice.
>>>
>>> code example:
>>>
>>> public Pipeline createPipeline(Pipeline pipeline) {
>>>
>>>pipeline.apply(new KafkaReadTransform()).apply(// other functions..);
>>> }
>>> In test, we will use the same createPipeline() function to create a
>>> pipeline but meanwhile we want to override KafkaReadTransform with another
>>> transform to avoid reading from Kafka.
>>>
>>> Thanks,
>>> Yuhong
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 20, 2021 at 9:02 PM Chamikara Jayalath 
>>> wrote:
>>>
 In general, TransformOverrides are expected to be per-runner
 implementation details and are not expected to be directly used by
 end-users.
 What is the exact use-case you are trying to achieve ? Are you running
 into a missing feature of an existing transform ?

 Thanks,
 Cham

 On Tue, Apr 20, 2021 at 5:58 PM Yuhong Cheng 
 wrote:

> Hi Beam,
> We have a use case when creating a pipeline, we want to replace the IO
> read/write transform when testing using `pipeline.replaceAll(overrides)`.
> However, we met some problems when doing tests:
> 1. Are there any ways we can avoid calling expand() of a transform
> when it is going to be replaced?  The reason we want to override a
> transform is because that the expand() of this transform is somehow not
> available in some situations. It seems not reasonable enough to call the
> expand() of the originalTransform and then call the expand() of the
> overrideTransform again?
> 2. When trying to implement `PTransformOverrideFactory`, we realize
> that the inputs are `TaggedPValue`, which can only make {Tuple,
> PCollection} pairs. Then if we want to override a write transform whose
> output type is `PDone`, what's the best way to implement this factory?
>
>
> Thanks in advance for answers! This is quite important to our
> pipelines.
>
> Thanks,
> Yuhong
>



Re: Error comparing flat schema against schema inferred from protoclass

2021-04-21 Thread Reuven Lax
In protobuf nullability corresponds to optional v.s. required fields.

On Wed, Apr 21, 2021 at 8:36 AM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Thanks for the insight, Robert!
> The Java class is autogenerated from a protobuf schema. I'm not very
> knowledgeable regarding proto, but it looks like NOT NULLness can be
> achieved through *oneof kind* and through implementation of a new data
> type that wraps around the desired data. What are your thoughts on that?
> I tried adding the NOT NULL keyword to each column when creating the
> tables [1]
> 
> and [2]
> 
>  but
> that failed when executing the *executeDdl* method.
>
> Is it worth to try the *oneof kind *approach or is there a simpler way
> I'm not seeing to accomplish the NOT NULLness?
>
> Thanks again!
>
> On Tue, Apr 20, 2021 at 2:08 PM Robert Burke  wrote:
>
>> It looks like it doesn't consider TYPE to match TYPE NOT NULL. I don't
>> know how Beam Java handles that but I'd guess you'd need to annotate
>> somehow the fields to make them match.
>>
>> On Tue, Apr 20, 2021, 12:24 PM Fernando Morales Martinez <
>> fernando.mora...@wizeline.com> wrote:
>>
>>> sure thing!
>>> This is the error from method *testSQLInsertRowsToPubsubFlat*
>>>
>>> Given message schema: 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>> does not match schema inferred from protobuf class.
>>> Protobuf class:
>>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>>> Inferred schema: 'Fields:
>>> Field{name=name, description=, type=STRING NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=1
>>> Field{name=height, description=, type=INT32 NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=2
>>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=3
>>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>>> value=NameHeightKnowsJSMessage}}}'
>>>
>>> And this is the message from
>>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*.
>>>
>>> Given message schema: 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>> does not match schema inferred from protobuf class.
>>> Protobuf class:
>>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>>> Inferred schema: 'Fields:
>>> Field{name=name, description=, type=STRING NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=1
>>> Field{name=height, description=, type=INT32 NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=2
>>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=3
>>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>>> value=NameHeightKnowsJSMessage}}}'
>>>
>>> Maybe the proto variable *knowsJs* should be called *knowsJavascript*?
>>>
>>>
>>> On Tue, Apr 20, 2021 at 12:26 PM Daniel Collins 
>>> wrote:
>>>
 The error includes "NameHeightMessage". Can you provide an example
 error from one of the other methods?

 On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
 fernando.mora...@wizeline.com> wrote:

> Thanks for the heads up, Daniel!
>
> I missed changing that one, but even after making the change, I'm
> getting the same error.
>
> The other two methods, *testSQLInsertRowsToPubsubFlat* and
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already
> using NameHeightKnowsJSMessage class but are still throwing the same 
> error.
>
> Any idea what else might be going on?
>
> On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
> wrote:
>
>> Thanks for working on this! It looks to me like the schemas don't
>> match: you appear to be using NameHeightMessage defined as:
>>
>> ```
>> message NameHeightMessage {
>>   string name = 1;
>>   int32 height = 2;
>> }
>> ```
>>
>> And expecting it to work with a table schema that 

Re: Error comparing flat schema against schema inferred from protoclass

2021-04-21 Thread Daniel Collins
I think the issue might actually be a field name mismatch. It looks like
you have "knowsJavascript" in the table, but "knowsJs" in the proto.

You should be able to go from NOT NULL -> NULL without issue.

On Wed, Apr 21, 2021 at 11:36 AM Fernando Morales Martinez <
fernando.mora...@wizeline.com> wrote:

> Thanks for the insight, Robert!
> The Java class is autogenerated from a protobuf schema. I'm not very
> knowledgeable regarding proto, but it looks like NOT NULLness can be
> achieved through *oneof kind* and through implementation of a new data
> type that wraps around the desired data. What are your thoughts on that?
> I tried adding the NOT NULL keyword to each column when creating the
> tables [1]
> 
> and [2]
> 
>  but
> that failed when executing the *executeDdl* method.
>
> Is it worth to try the *oneof kind *approach or is there a simpler way
> I'm not seeing to accomplish the NOT NULLness?
>
> Thanks again!
>
> On Tue, Apr 20, 2021 at 2:08 PM Robert Burke  wrote:
>
>> It looks like it doesn't consider TYPE to match TYPE NOT NULL. I don't
>> know how Beam Java handles that but I'd guess you'd need to annotate
>> somehow the fields to make them match.
>>
>> On Tue, Apr 20, 2021, 12:24 PM Fernando Morales Martinez <
>> fernando.mora...@wizeline.com> wrote:
>>
>>> sure thing!
>>> This is the error from method *testSQLInsertRowsToPubsubFlat*
>>>
>>> Given message schema: 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>> does not match schema inferred from protobuf class.
>>> Protobuf class:
>>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>>> Inferred schema: 'Fields:
>>> Field{name=name, description=, type=STRING NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=1
>>> Field{name=height, description=, type=INT32 NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=2
>>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=3
>>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>>> value=NameHeightKnowsJSMessage}}}'
>>>
>>> And this is the message from
>>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*.
>>>
>>> Given message schema: 'Fields:
>>> Field{name=name, description=, type=STRING, options={{}}}
>>> Field{name=height, description=, type=INT32, options={{}}}
>>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>>> Options:{{}}'
>>> does not match schema inferred from protobuf class.
>>> Protobuf class:
>>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>>> Inferred schema: 'Fields:
>>> Field{name=name, description=, type=STRING NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=1
>>> Field{name=height, description=, type=INT32 NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=2
>>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>>> value=3
>>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>>> value=NameHeightKnowsJSMessage}}}'
>>>
>>> Maybe the proto variable *knowsJs* should be called *knowsJavascript*?
>>>
>>>
>>> On Tue, Apr 20, 2021 at 12:26 PM Daniel Collins 
>>> wrote:
>>>
 The error includes "NameHeightMessage". Can you provide an example
 error from one of the other methods?

 On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
 fernando.mora...@wizeline.com> wrote:

> Thanks for the heads up, Daniel!
>
> I missed changing that one, but even after making the change, I'm
> getting the same error.
>
> The other two methods, *testSQLInsertRowsToPubsubFlat* and
> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already
> using NameHeightKnowsJSMessage class but are still throwing the same 
> error.
>
> Any idea what else might be going on?
>
> On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
> wrote:
>
>> Thanks for working on this! It looks to me like the schemas don't
>> match: you appear to be using NameHeightMessage defined as:
>>
>> ```
>> message NameHeightMessage {

Re: Error comparing flat schema against schema inferred from protoclass

2021-04-21 Thread Fernando Morales Martinez
Thanks for the insight, Robert!
The Java class is autogenerated from a protobuf schema. I'm not very
knowledgeable regarding proto, but it looks like NOT NULLness can be
achieved through *oneof kind* and through implementation of a new data type
that wraps around the desired data. What are your thoughts on that?
I tried adding the NOT NULL keyword to each column when creating the tables
[1]

and [2]

but
that failed when executing the *executeDdl* method.

Is it worth to try the *oneof kind *approach or is there a simpler way I'm
not seeing to accomplish the NOT NULLness?

Thanks again!

On Tue, Apr 20, 2021 at 2:08 PM Robert Burke  wrote:

> It looks like it doesn't consider TYPE to match TYPE NOT NULL. I don't
> know how Beam Java handles that but I'd guess you'd need to annotate
> somehow the fields to make them match.
>
> On Tue, Apr 20, 2021, 12:24 PM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> sure thing!
>> This is the error from method *testSQLInsertRowsToPubsubFlat*
>>
>> Given message schema: 'Fields:
>> Field{name=name, description=, type=STRING, options={{}}}
>> Field{name=height, description=, type=INT32, options={{}}}
>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>> Options:{{}}'
>> does not match schema inferred from protobuf class.
>> Protobuf class:
>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>> Inferred schema: 'Fields:
>> Field{name=name, description=, type=STRING NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=1
>> Field{name=height, description=, type=INT32 NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=2
>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=3
>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>> value=NameHeightKnowsJSMessage}}}'
>>
>> And this is the message from
>> *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*.
>>
>> Given message schema: 'Fields:
>> Field{name=name, description=, type=STRING, options={{}}}
>> Field{name=height, description=, type=INT32, options={{}}}
>> Field{name=knowsJavascript, description=, type=BOOLEAN, options={{}}}
>> Options:{{}}'
>> does not match schema inferred from protobuf class.
>> Protobuf class:
>> 'org.apache.beam.sdk.extensions.protobuf.PayloadMessages$NameHeightKnowsJSMessage'
>> Inferred schema: 'Fields:
>> Field{name=name, description=, type=STRING NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=1
>> Field{name=height, description=, type=INT32 NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=2
>> Field{name=knowsJs, description=, type=BOOLEAN NOT NULL,
>> options={{beam:option:proto:meta:number=Option{type=INT32 NOT NULL,
>> value=3
>> Options:{{beam:option:proto:meta:type_name=Option{type=STRING NOT NULL,
>> value=NameHeightKnowsJSMessage}}}'
>>
>> Maybe the proto variable *knowsJs* should be called *knowsJavascript*?
>>
>>
>> On Tue, Apr 20, 2021 at 12:26 PM Daniel Collins 
>> wrote:
>>
>>> The error includes "NameHeightMessage". Can you provide an example error
>>> from one of the other methods?
>>>
>>> On Tue, Apr 20, 2021 at 1:56 PM Fernando Morales Martinez <
>>> fernando.mora...@wizeline.com> wrote:
>>>
 Thanks for the heads up, Daniel!

 I missed changing that one, but even after making the change, I'm
 getting the same error.

 The other two methods, *testSQLInsertRowsToPubsubFlat* and
 *testSQLInsertRowsToPubsubWithTimestampAttributeFlat*, were already
 using NameHeightKnowsJSMessage class but are still throwing the same error.

 Any idea what else might be going on?

 On Tue, Apr 20, 2021 at 11:13 AM Daniel Collins 
 wrote:

> Thanks for working on this! It looks to me like the schemas don't
> match: you appear to be using NameHeightMessage defined as:
>
> ```
> message NameHeightMessage {
>   string name = 1;
>   int32 height = 2;
> }
> ```
>
> And expecting it to work with a table schema that has a "BOOL
> knowsJavascript" field. Did you mean to use the "NameHeightKnowsJSMessage"
> class?
>
> -Daniel
>
> On Tue, Apr 20, 2021 at 1:02 PM Fernando Morales Martinez <
> fernando.mora...@wizeline.com> wrote:
>
>> Sorry for the spam, forgot to add the pertinent 

Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-21 Thread Chamikara Jayalath
+1 (binding)

Ran some Python scenarios and updated the spreadsheet.

Thanks,
Cham

On Tue, Apr 20, 2021 at 3:39 PM Kenneth Knowles  wrote:

>
>
> On Tue, Apr 20, 2021 at 3:24 PM Robert Bradshaw 
> wrote:
>
>> The artifacts and signatures look good to me. +1 (binding)
>>
>> (The release branch still has the .dev name, maybe you didn't push?
>> https://github.com/apache/beam/blob/release-2.29.0/sdks/python/apache_beam/version.py
>> )
>>
>
> Good point. I'll highlight that I finally implemented the branching
> changes from
> https://lists.apache.org/thread.html/205472bdaf3c2c5876533750d417c19b0d1078131a3dc04916082ce8%40%3Cdev.beam.apache.org%3E
>
> The new guide with diagram is here:
> https://beam.apache.org/contribute/release-guide/#tag-a-chosen-commit-for-the-rc
>
> TL;DR:
>  - the release branch continues to be dev/SNAPSHOT for 2.29.0 while the
> main branch is now dev/SNAPSHOT for 2.30.0
>  - the RC tag v2.29.0-RC1 no longer lies on the release branch. It is a
> single tagged commit that removes the dev/SNAPSHOT suffix
>
> Kenn
>
>
>> On Tue, Apr 20, 2021 at 10:36 AM Kenneth Knowles  wrote:
>>
>>> Please take another look.
>>>
>>>  - I re-ran the RC creation script so the source release and wheels are
>>> new and built from the RC tag. I confirmed the source zip and wheels have
>>> version 2.29.0 (not .dev or -SNAPSHOT).
>>>  - I fixed and rebuilt Dataflow worker container images from exactly the
>>> RC commit, added dataclasses, with internal changes to get the version to
>>> match.
>>>  - I confirmed that the staged jars already have version 2.29.0 (not
>>> -SNAPSHOT).
>>>  - I confirmed with `diff -r -q` that the source tarball matches the RC
>>> tag (minus the .git* files and directories and gradlew)
>>>
>>> Kenn
>>>
>>> On Mon, Apr 19, 2021 at 9:19 PM Kenneth Knowles  wrote:
>>>
 At this point, the release train has just about come around to 2.30.0
 which will pick up that change. I don't think it makes sense to cherry-pick
 anything more into 2.29.0 unless it is nonfunctional. As it is, I think we
 have a good commit and just need to build the expected artifacts. Since it
 isn't all the artifacts, I was planning on just overwriting the RC1
 artifacts in question and re-verify. I could also roll a new RC2 from the
 same commit fairly easily.

 Kenn

 On Mon, Apr 19, 2021 at 8:57 PM Reuven Lax  wrote:

> Any chance we could include https://github.com/apache/beam/pull/14548?
>
> On Mon, Apr 19, 2021 at 8:54 PM Kenneth Knowles 
> wrote:
>
>> To clarify: I am running and fixing the release scripts on the
>> `master` branch. They work from fresh clones of the RC tag so this should
>> work in most cases. The exception is the GitHub Actions configuration,
>> which I cherrypicked
>> to the release branch.
>>
>> Kenn
>>
>> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles 
>> wrote:
>>
>>> OK it sounds like I need to re-roll the artifacts in question. I
>>> don't think anything raised here indicates a problem with the tagged
>>> commit, but with the state of the release scripts at the time I built 
>>> the
>>> earlier artifacts.
>>>
>>> On Mon, Apr 19, 2021 at 1:03 PM Robert Bradshaw 
>>> wrote:
>>>
 It looks like the wheels are also versioned "2.29.0.dev".

 Not sure if it's important, but the source tarball also seems to
 contain some release script changes that are not reflected in the 
 github
 branch.

 On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles 
 wrote:

> Thanks for the details, Valentyn & Cham. I will fix the Dataflow
> worker containers then update this thread.
>
> Kenn
>
> On Mon, Apr 19, 2021 at 8:36 AM Kenneth Knowles 
> wrote:
>
>>
>>
>> On Fri, Apr 16, 2021 at 3:42 AM Elliotte Rusty Harold <
>> elh...@ibiblio.org> wrote:
>>
>>> On Fri, Apr 16, 2021 at 4:02 AM Kenneth Knowles 
>>> wrote:
>>>
>>> > The complete staging area is available for your review, which
>>> includes:
>>> > * JIRA release notes [1],
>>> > * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with
>>> fingerprint 03DBA3E6ABDD04BFD1558DC16ED551A8AE02461C [3],
>>> > * all artifacts to be deployed to the Maven Central Repository
>>> [4],
>>> > * source code tag "v2.29.0-RC1" [5],
>>> > * website pull request listing the release [6], publishing the
>>> API reference manual [7], and the blog post [8].
>>> > * Java artifacts were built with Maven MAVEN_VERSION and
>>> OpenJDK/Oracle JDK JDK_VERSION.
>>>
>>> Are the MAVEN_VERSION and OpenJDK/Oracle JDK JDK_VERSION
>>> supposed to
>>> be 

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Tomo Suzuki
BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide
[1], I see I was not following this

> The reviewer should give the LGTM and then request that the author of the 
> pull request rebase, squash, split, etc, the commits, so that the history is 
> most useful


Thank you for the feedback on this matter! (And I don't think we
should change the contribution guide)

[1] https://beam.apache.org/contribute/committer-guide/

On Wed, Apr 21, 2021 at 10:35 AM Ismaël Mejía  wrote:
>
> Hello,
>
> I have noticed an ongoing pattern of carelessness around issues/PR titles and
> descriptions. It is really painful to see more and more examples like:
>
> BEAM-12160 Add TODO for fixing the warning
> BEAM-12165 Fix ParquetIO
> BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
> toMinutes (commit)
>
> In all these cases with just a bit of detail in the title it would be enough 
> to
> make other contributors or reviewers life easierm as well as to have a better
> project history.  What astonishes me apart of the lack of care is that some of
> those are from Beam commmitters.
>
> We already have discussed about not paying attention during commit merges 
> where
> some PRs end up merging tons of 'unwanted' fixup commits, and nothing has
> changed so I am wondering if we should maybe just totally remove that rule 
> (for
> commits) and also eventually for titles and descriptions.
>
> Ismaël
>
> [1] https://beam.apache.org/contribute/



-- 
Regards,
Tomo


Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Ismaël Mejía
Hello,

I have noticed an ongoing pattern of carelessness around issues/PR titles and
descriptions. It is really painful to see more and more examples like:

BEAM-12160 Add TODO for fixing the warning
BEAM-12165 Fix ParquetIO
BEAM-12173 avoid intermediate conversion (PR) and BEAM-12173 use
toMinutes (commit)

In all these cases with just a bit of detail in the title it would be enough to
make other contributors or reviewers life easierm as well as to have a better
project history.  What astonishes me apart of the lack of care is that some of
those are from Beam commmitters.

We already have discussed about not paying attention during commit merges where
some PRs end up merging tons of 'unwanted' fixup commits, and nothing has
changed so I am wondering if we should maybe just totally remove that rule (for
commits) and also eventually for titles and descriptions.

Ismaël

[1] https://beam.apache.org/contribute/


Performance tests dashboard not working

2021-04-21 Thread Ismaël Mejía
Following the conversation on the performance regression on Flink
runner I wanted to take a look at the performance dashboards (Nexmark
+ Load Tests) but when I open the dashboards it says there is a
connectivity error "NetworkError when attempting to fetch resource.".
Can someone with more knowledge about our CI / dashboards infra please
take a look.

http://104.154.241.245/d/ahudA_zGz/nexmark?orgId=1


Re: GitHub permission denied

2021-04-21 Thread Alexey Romanenko
Hi Iurii,

You can’t directly push into Beam repository. 

Please, create your own fork, push your changes into the feature branch of your 
fork and then create a PR against Beam main branch.
You can find a more details with a step-by-step guide here [1].

Also, the common contribution guide [2] should be very helpful.

Alexey

[1] https://cwiki.apache.org/confluence/display/BEAM/Git+Tips
[2] https://beam.apache.org/contribute/

> On 21 Apr 2021, at 15:02, Iurii Kazanov  wrote:
> 
> Hello! I'm trying to push to 'https://github.com/apache/beam/ 
> ' but I'm getting message "
> remote: Permission to apache/beam.git denied to iurii-kazanov.
> fatal: unable to access 'https://github.com/apache/beam/': 
>  The requested URL returned error: 403". 
> What I need to do?



GitHub permission denied

2021-04-21 Thread Iurii Kazanov
Hello! I'm trying to push to 'https://github.com/apache/beam/' but I'm getting 
message "

remote: Permission to apache/beam.git denied to iurii-kazanov.
fatal: unable to access 'https://github.com/apache/beam/': The requested URL 
returned error: 403". What I need to do?